The avoidance of possible single point failures is important for most space missions especially for mission critical services. SpaceWire provides a simple means of adding fault tolerance into a system where it is required.
SpaceWire is fairly robust due to the good EMC properties of LVDS and the cable screening used. Rarely are errors seen on a link unless they are injected. If a transient error does occur then the SpaceWire link immediately disconnects itself electrically and goes through the re-initialisation process. In 20 µs the link is up and running again. The packet that was in the process of being transferred is truncated and terminated by a special Error End of Packet (EEP) character to indicate that it was terminated prematurely. The next packet to be sent will be delivered successfully provided that the fault was temporary. If the packet that was terminated by the fault was important then it is up to the user application to detect the fact that it was not delivered properly and to resend the information. Protocols for providing this type of service which run over SpaceWire have been and are being developed by the SpaceWire working group.
If the fault on a SpaceWire link is permanent, for example the wires may have become disconnected or a SpaceWire interface may have stopped working, then recovery requires a second, redundant SpaceWire link. This is illustrated in Figure 6.
Figure 6 Fault Tolerant Links
If the prime SpaceWire link stops working then the redundant link has to be started and data sent over this link. Simple logic in the instrument can provide this functionality.
A more robust system is illustrated in Figure 7.
Figure 7 Cross-Strapping
In this example the instrument is crucial to the mission success so two instruments are provided. Each instrument has two SpaceWire interfaces: one prime and one redundant. Similarly there are two memory units. During normal operation the redundant instrument and memory units are switched off. The prime instrument sends data to the prime memory. If the prime instrument fails then it is switched off and the redundant instrument is switched on. The prime memory then receives instrument data via its redundant SpaceWire interface. This classical cross-strapping is readily supported by SpaceWire – additional links are simply put where redundancy is required.
The advantages of this type of architecture are:
- Simplicity
- Low power per Mbit/s
- Full bandwidth of link available to application
- Fault tolerant
The disadvantages are:
- Mass penalty as several links needed for redundancy
- Inefficient if bandwidth not fully utilised
This architecture is ideal where the direct connection of an instrument to a memory or other unit is required and where a single point failure is not acceptable.
It is important that a failure on the prime link does not propagate and cause a failure on the redundant link.