The hitchhiker’s guide to spacecraft software
- Processes, standards and quality
- Technologies
- Others
Two years ago, I was given a chance to work on a space project - software for an on-board computer for PW-Sat2 - a small satellite built by Polish students. Without any experience in these kinds of projects, I accepted the challenge. In this post, I would like to demystify some aspects of the project and show that actually it’s not that ‘out of this world’.
PW-Sat2
PW-Sat2 is the second Polish student satellite created by young people from the Warsaw University of Technology. Its primary mission is testing a deorbitation sail which should reduce the time for which the satellite will stay on orbit from over 20 years to about one year. In order to gather useful scientific data during the sail opening, an OnBoard Computer (OBC) will be included in the satellite with software written by Future Processing.
Environment
Unfortunately, space is a pretty hostile environment. There is no way to use a standard approach for most problems in IT (“have you tried turning it off and on again?”) – if a satellite is not responding to commands, there is no way to press the “Reset” button. There is no air that can gather heat from components, so components are more vulnerable to overheating. Power can go off anytime, so the software must be able to recover from half-finished writes and try to measure time even when power is down. If this wasn’t enough, there is one more thing than affects software in a great manner: radiation.
Radiation
Devices running on the Earth’s surface have a great benefit – they are mostly shielded from cosmic rays. While going higher and higher, this protection is no longer working and devices are more and more vulnerable to damage. Cosmic rays, which consist mostly of high-energy particles, can cause serious problems. In extreme cases, radiation can destroy electronic elements, with luck it will cause one of the Single Event Effects (SEE), i.e.:
- Single Event Transient – a bit flips to opposite value and then flips to proper value again (1->0->1)
- Single Event Upset – a bit flips to opposite value and stays that way
- Single Event Latchup – a particle causes short circuit which goes away after disabling power for the device and enabling it again
While it’s not possible to shield devices completely from SEE, it is possible to reduce the probability of negative effects by:
- using smaller and simpler elements,
- adding failover to critical components,
- using Triple Modular Redundancy (TMR).
TMR
TMR is used in PW-Sat2 to ensure correctness of data storage, settings storage and program copies. In case of data and settings storage, we are using three physical memory chips. Program copies are stored in a single memory chip but in three copies.
Implementing TMR is pretty simple and consist of two operations: write and read. The write operation should perform the same write operation on all three modules. The read operation is a bit more complicated as it needs to perform three separate reads from each module and then run bitwise majority voting to rule out errors caused by Single Event Upsets.
There is a third operation called ‘scrubbing’ which corrects errors by periodically reading and writing back chunks of memory. In case of PW-Sat2, scrubbing is performed on RAM, settings storage and program copies.
Hardware
On a daily basis, we work with devices capable of performing an enormous number of operations per second and storing a lot of information. However, when the OBC hardware is compared to ‘normal’ hardware, the space version seems a bit… slow.
OBC | My laptop |
---|---|
ARM Cortex M3, 48MHz | 4-core, x64, 2.8GHz |
1MB SRAM | 16GB RAM |
1MB program memory | 512GB SSD + 1TB HDD |
16MB data storage | Connectivity: USB, WiFi, Ethernet |
Connectivity: I2C, SPI, UART |
It’s not getting better when it comes to communication: PW-Sat2 is using radio communication with a maximum uplink (Earth to orbit) speed of 1200 bits per second and a maximum downlink (orbit to Earth) speed of 9600bps. Due to geometry, it will be possible to communicate with the satellite only for few minutes per day. The producer of communication module will provide only a low level protocol (AX.25) which guarantees only that received frame is valid (similar to the way UDP works), so it is necessary to develop a simple protocol on top of it.
Tools
Although the environment in which the OBC works is pretty extreme, tools we used for developing software are all well-known. These inlcude:
- C++ 11,
- GCC 5.4 for ARM microcontrollers (arm-none-eabi),
- CMake,
- Google Test (unit tests),
- Python 2.7 (hardware in the loop tests).
Unit testing was a bit problematic at the beginning when we realised that running unit test binarily on hardware was not feasible (not enough memory, problems with recording test results). On the other hand, compiling the same code twice just to run it on a different platform (x86) was risky – there are differences that could lead to problems on the target platform. To solve this problem, we used QEmu which can emulate one hardware platform on another – in this case ARM Cortex-M3 on Windows x86. By using an emulator, we are able to run code compiled once on target hardware as part of the OBC firmware and on a PC as part of unit tests. The solution is not 100% perfect but the emulated CPU is close enough to catch most of the bugs.
Based on the best practices, we are using a continuous integration solution (Jenkins) to automate the whole build and test process.
Apart from software, we are also using hardware components to be able to test integration between components.
On Board Computer Engineering Model (OBC EM)
Closely matching OBC that will be used in PW-Sat2 – the same microcontroller, memories and external interfaces.
Payload Demonstration Model (PLD DM)
Additional board with external memories for data and settings storage, camera connectors and real-time clock.
Device Mock STM32 Nucleo board with our software that allows emulation of every other device in PW-Sat2 connected via I2C bus (all devices except cameras). The behaviour of emulated device is described as Python code which allows us to test integration with external components without using flight version.
These components are connected together as presented in the schema below. When a test is performed, a PC sends a command to the OBC which causes some transactions on the I2C bus that are redirected to the PC by the Device Mock. Of course not all tests use the same mechanism, but the general idea stays the same (e.g. an emulated communication module is set up to ‘receive’ a frame that the OBC will handle).
Software
Operating system
By looking at the hardware specification of OBC, it’s hard to imagine running an operating system like Windows or Linux on it. In case of an embedded system, Real-Time Operating Systems (RTOS) are used, which, in this case, is FreeRTOS. It is composed of 6 source files (plus some header files) used like any other library by linking into target binary. It provides base OS-like functionalities:
- Multithreading with preemption
- Synchronisation primitives: queues, semaphores, event groups
By using FreeRTOS we get basic mechanisms for working with multiple tasks which resembles working with threads in Windows. One thing that is different is that FreeRTOS requires to estimate how much memory each task will use. Putting too large a value will result in wasted RAM, put too little and the thread will overwrite data belonging to another task or some system structures which will lead to difficulties in debugging errors. In case of PW-Sat2 we are in a comfortable position of having so much RAM to be able to provide a huge margin for each task.
Mission loop
One of the most critical tasks of the OBC is to carry out the mission plan – for PW-Sat2 its main point is opening the sail after 40 days from launch. The implementation of the mission loop is shown below:
SystemState state;
while(1)
{
for(auto& u: _updates)
u.Update(state);
for(auto& a: _actions)
if(a.CanExecute(state))
a.Execute(state);
Sleep(10s);
}
It’s not hard to recognise a game loop (or message loop in Windows applications) which is an easy way to periodically update system state and perform some actions based on what is happening. Updates invoked from loop include: determining current mission time, and evaluating complex conditions for antennas and sail. Example actions are: opening antennas, opening sail and periodic restart.
Summary
I hope that I demystified some aspects of working on a space project. While PW-Sat2 is a not big spacecraft, it gives a glimpse into technology, problems and challenges of devices that we will encounter out there.