System Safety & Functional Safety
System safety is the backbone of aviation certification. Before any system can be approved, its potential failure conditions must be identified, classified, and mitigated. This hub covers the core concepts: Functional Hazard Assessment (FHA), Fault Tree Analysis (FTA), Failure Modes and Effects Analysis (FMEA), Design Assurance Levels (DAL), and the safety architecture principles that ensure aircraft systems meet their safety objectives.
32 terms in this topic
All Terms
The state in which risks associated with aviation activities, related to or in direct support of the operation of aircraft, are reduced and controlled to an acceptable level. In the context of aircraft certification, safety is achieved by demonstrating that the aircraft design meets quantitative and qualitative safety objectives established by the applicable airworthiness requirements. Safety is not the absence of risk but the management of risk to acceptable levels as defined by regulatory authorities.
The combination of the probability (or frequency) of occurrence of a harmful event and the severity of that event. In system safety analysis, risk is assessed by evaluating how likely a failure condition is to occur and how severe its effects would be on the aircraft, its occupants, and people on the ground. Risk assessment is the basis for determining whether a design meets safety objectives: each failure condition must have a probability of occurrence commensurate with its severity classification.
A condition, event, or circumstance that could lead to or contribute to an unplanned or undesired event resulting in harm. In aviation system safety, a hazard is typically a failure condition or combination of failure conditions at the aircraft or system level that, if not mitigated, could result in injury, death, or damage. Hazards are identified through systematic analysis processes such as Functional Hazard Assessment (FHA) and are characterized by their potential severity and likelihood.
Physical injury or damage to the health of people, or damage to property or the environment. In aviation safety, harm is the ultimate adverse outcome that safety objectives seek to prevent or minimize. The severity classification of failure conditions (catastrophic through no safety effect) is based on the degree of harm that could result: from hull loss and multiple fatalities (catastrophic) to no effect on safety (no safety effect).
A condition having an effect on the aircraft and its occupants, both direct and consequential, caused or contributed to by one or more failures considering flight phase and relevant adverse operational or environmental conditions or external events. A failure condition is not the failure itself but the effect of the failure (or combination of failures) at the aircraft level. Failure conditions are classified by severity and assigned probability objectives accordingly.
Three related but distinct concepts in system safety. A failure is the inability of a system, subsystem, or component to perform its required function within specified limits. A failure is an event — the transition from a working state to a non-working state. A fault is an abnormal condition or defect at the component, subsystem, or system level that may lead to a failure. A fault is a state — a latent or active deficiency in the system. An error is a design mistake, an incorrect action, or an unintended deviation in specification, development, or operation that may cause or contribute to a fault. Errors are causes (often human), faults are states (often latent), and failures are events (observable loss of function).
The categorization of failure conditions by their severity of effect on the aircraft and its occupants. Five classifications are defined: (1) Catastrophic — failure conditions that would result in multiple fatalities, usually with the loss of the aircraft; (2) Hazardous (also called Severe-Major) — failure conditions that would reduce the capability of the aircraft or the ability of the crew to cope with adverse operating conditions to the extent that there would be a large reduction in safety margins or functional capabilities, physical distress or higher workload such that the crew could not be relied upon to perform their tasks accurately or completely, serious or fatal injury to a relatively small number of occupants; (3) Major — failure conditions that would reduce the capability of the aircraft or the ability of the crew to cope with adverse operating conditions to the extent that there would be a significant reduction in safety margins or functional capabilities, significant increase in crew workload or in conditions impairing crew efficiency, or discomfort to occupants possibly including injuries; (4) Minor — failure conditions that would not significantly reduce aircraft safety and that involve crew actions well within their capabilities, including slight reduction in safety margins, slight increase in workload, or some physical discomfort to occupants; (5) No Safety Effect — failure conditions that have no effect on safety.
A designation of the rigor of the development assurance process applied to a system, software item, or hardware item, based on the severity of the most severe failure condition to which the item contributes. DAL is sometimes referred to as Item Development Assurance Level (IDAL). Five levels are defined: DAL A (most rigorous, associated with catastrophic failure conditions), DAL B (hazardous), DAL C (major), DAL D (minor), and DAL E (no safety effect, no development assurance objectives). The DAL drives the rigor of planning, development, verification, and configuration management activities as specified in standards like DO-178C (software), DO-254 (hardware), and ARP4754B (systems).
The quantitative and qualitative targets that a design must meet for each failure condition classification. For transport category aircraft under 14 CFR/CS 25.1309, the quantitative probability targets are: Catastrophic failure conditions must be extremely improbable (typically interpreted as a probability of occurrence on the order of 10^-9 or less per flight hour); Hazardous failure conditions must be extremely remote (on the order of 10^-7 per flight hour); Major failure conditions must be remote (on the order of 10^-5 per flight hour); Minor failure conditions must be probable (no specific numerical threshold, but must be shown to be acceptable). In addition to probability targets, qualitative objectives apply: no single failure should lead to a catastrophic failure condition, and the crew must be able to detect and manage failure conditions through appropriate annunciation and procedures.
A systematic, comprehensive examination of aircraft and system functions to identify and classify failure conditions associated with the loss or malfunction of those functions. The FHA is performed at the aircraft level and at the system level. The Aircraft-level FHA (AFHA) identifies failure conditions by examining what happens when each aircraft-level function is lost, malfunctions, or is provided with erroneous information, across all relevant flight phases and environmental conditions. System-level FHAs decompose the aircraft-level functions into system functions and identify additional failure conditions. The output of the FHA is a list of failure conditions, their severity classifications, and the associated safety objectives.
A systematic evaluation of a proposed system architecture to determine how failures within the architecture could lead to the failure conditions identified in the FHA, and whether the proposed architecture can meet the safety objectives. The PSSA examines the system design at an early stage using qualitative and preliminary quantitative methods, such as preliminary fault trees, dependency diagrams, and Markov models. The PSSA establishes safety requirements for the system elements — including hardware, software, and human factors — that must be met to achieve the system-level safety objectives. These derived safety requirements are then allocated to lower-level items.
A systematic, comprehensive evaluation of the implemented system design to show that the safety objectives established in the FHA are met by the final design. The SSA compiles and evaluates all safety analysis results — including quantitative analyses (fault trees, reliability analyses), qualitative assessments, common cause analyses, and verification evidence — to provide a complete safety argument for the system. The SSA demonstrates that each failure condition identified in the FHA has been addressed and that the applicable probability and qualitative requirements are satisfied.
A top-down, deductive analytical method used to determine the combinations of lower-level events (hardware failures, software errors, human errors, environmental conditions, and maintenance actions) that could cause a specific undesired top-level event (typically a failure condition identified in the FHA). The fault tree is a graphical model using Boolean logic gates (AND, OR, NOT, voting gates) to represent the logical relationships between events. Quantitative FTA assigns failure rates to basic events and calculates the probability of the top event using Boolean algebra or numerical methods. Qualitative FTA identifies minimal cut sets — the smallest combinations of basic events that can cause the top event.
A bottom-up, inductive analytical method that systematically examines each component or item in a system to identify its potential failure modes, the local and system-level effects of each failure mode, and the means of detection. FMEA examines each item in isolation: for each possible failure mode (e.g., open circuit, short circuit, stuck in position), the analyst determines the immediate effect on the item, the effect on the next higher assembly, and the end effect at the system or aircraft level. The analysis also identifies compensating provisions (redundancy, monitoring, crew alerts) and assesses the severity of the end effect.
A summary-level analysis that consolidates the results of detailed FMEAs to present the system-level effects of component failure modes. The FMES identifies the failure modes of replaceable items (typically LRUs — Line Replaceable Units) and their effects at the system and aircraft level. It provides a higher-level view than the detailed FMEA and is used as input to the SSA and to operational and maintenance documentation.
An extension of FMEA that adds a criticality assessment to each failure mode. The criticality analysis ranks failure modes based on a combination of the severity of their end effect and their probability of occurrence. This ranking helps prioritize design mitigation efforts and focus verification activities on the most safety-critical failure modes. FMECA combines the qualitative failure mode and effects analysis with a quantitative or semi-quantitative criticality assessment.
A set of safety analysis methods that evaluate the susceptibility of a system to events or conditions that could simultaneously affect multiple items or functions, defeating architectural features such as redundancy and independence. CCA encompasses three complementary analyses: (1) Zonal Safety Analysis (ZSA) — evaluates physical proximity and installation-related common causes; (2) Particular Risk Analysis (PRA) — evaluates external hazards such as fire, bird strike, tire burst, uncontained engine rotor failure, and lightning; (3) Common Mode Analysis (CMA) — evaluates systematic common causes such as common hardware, common software, common requirements errors, common manufacturing processes, and common maintenance errors.
A safety analysis that examines each zone of the aircraft to identify potential safety concerns arising from the physical installation of systems and equipment. ZSA evaluates whether items from different systems are installed in the same zone in a way that could create common cause failures, interference between systems, or maintenance errors. The analysis considers wire routing, fluid line proximity, equipment mounting, access for maintenance, and the potential for one system's failure to damage adjacent systems (e.g., a leaking hydraulic line damaging adjacent electrical wiring).
A safety analysis that evaluates the effects of specific external hazards (particular risks) on the aircraft systems, to ensure that these hazards cannot defeat the safety architecture through common cause effects. Particular risks include uncontained engine rotor failure, bird strike, tire burst, wheel rim release, fire, lightning, high-intensity radiated fields (HIRF), fluid leakage, hail, and other external threats. For each particular risk, the analysis identifies which systems and components could be affected, evaluates whether the system architecture provides adequate protection (through segregation, shielding, or separation), and determines the resulting failure conditions.
A failure that is not immediately apparent to the flight crew during normal operations. Latent failures are undetected until revealed by a specific test, inspection, another failure, or a demand on the failed function. In the context of safety assessment, latent failures are significant because they increase exposure time — the period during which the system is operating in a degraded state without the crew's knowledge. The combination of a latent failure and a subsequent active failure can result in a more severe failure condition than either failure alone.
A design characteristic ensuring that a failure, error, or external event affecting one element of a system does not propagate to or simultaneously affect another element. Independence is required when redundancy is used to meet safety objectives: two redundant channels provide safety benefit only if they are truly independent such that a single cause cannot defeat both. Independence can be achieved through physical separation (different locations), functional independence (different interfaces and data paths), electrical isolation (separate power supplies), and logical independence (different software, different design teams).
The provision of more than one means (item, function, or pathway) for accomplishing a given function, such that the failure of one means does not result in the loss of the function. Redundancy can be active (all redundant elements operating simultaneously, as in dual flight computers both processing commands) or standby (a backup element activated only upon failure of the primary, as in a standby hydraulic pump). The effectiveness of redundancy in meeting safety objectives depends on the independence of the redundant elements, the detection and switching mechanisms, and the coverage of failure modes.
A design strategy in which redundant elements are implemented using different technologies, different design approaches, different hardware components, different software implementations, or different development teams, to reduce the likelihood that a common design error, manufacturing defect, or systematic failure affects all redundant elements simultaneously. Dissimilarity specifically targets systematic common causes that cannot be addressed by physical separation alone.
The physical or functional separation of system elements to prevent a failure, external event, or environmental condition affecting one element from propagating to another. Physical segregation involves routing, mounting, or locating redundant elements in different zones, on different sides of the aircraft, or behind different barriers. Functional segregation involves using different interfaces, different power sources, different buses, or different signal paths. Segregation is a key means of achieving independence between redundant elements.
A design philosophy in which the occurrence of any single failure, or likely combination of failures, results in a safe condition or allows continued safe flight and landing. In a fail-safe design, failures are accommodated through a combination of redundancy, designed failure paths, detectability, and crew procedures. The fail-safe concept was the original safety philosophy for transport aircraft structure (fail-safe structure permits damage or partial failure without catastrophic structural failure) and has been extended to systems design. Under 14 CFR/CS 25.1309, the fail-safe design concept requires that no single failure results in a catastrophic failure condition.
A system design approach in which the system continues to perform its intended function without degradation after the occurrence of a failure. In a fail-operational system, redundancy and automatic reconfiguration allow the function to continue operating normally even when one element has failed. Fail-operational capability is typically required for flight-critical functions where any interruption would be unacceptable, such as autopilot systems during automatic landing (Cat III operations) or fly-by-wire flight control systems.
A system design approach in which the system, upon detecting a failure, transitions to a safe, neutral state that does not adversely affect the aircraft's flight path or controllability. In a fail-passive design, the system ceases to provide its function but does so in a way that does not produce a hazardous output. The crew is expected to take over the function manually. Fail-passive is commonly used for autopilot systems in Cat I and Cat II approach operations: upon failure, the autopilot disengages cleanly without introducing a transient upset.
A defined boundary within a system architecture beyond which the effects of a fault cannot propagate. A fault containment region is designed so that any fault originating within the region is either contained within that region (preventing it from affecting other regions) or is detected before it can propagate. Fault containment regions are established through hardware isolation, software partitioning, interface monitoring, and architectural boundaries. The concept is particularly important in integrated modular avionics (IMA), where multiple functions of different DALs share computing resources.
The set of design features, architectural decisions, and implementation strategies that collectively provide the system's ability to meet safety objectives. A safety architecture encompasses redundancy schemes, independence provisions, fault detection and monitoring mechanisms, reconfiguration strategies, crew alerting, reversionary modes, and the overall allocation of safety requirements to hardware, software, and operational procedures. The safety architecture is defined during the system development process (per ARP4754B) and is evaluated through the safety assessment process (per ARP4761A).
Requirements that are generated through the safety assessment process (PSSA, SSA) rather than being directly traceable to a higher-level requirement or regulation. Derived safety requirements emerge from the architecture and implementation decisions made to achieve safety objectives. Examples include requirements for failure monitoring (to detect latent failures), requirements for dissimilarity between redundant channels, independence requirements for power supplies to redundant systems, exposure time limits for maintenance intervals, and requirements for crew annunciation of degraded states.
Failure effects that propagate from one system or function to other systems or functions through physical, electrical, logical, or functional interfaces. A cascading effect occurs when a failure in one system causes degradation or failure in another system that is not directly related, through shared resources (power, cooling, data buses), physical proximity, or functional dependencies. Cascading effects can amplify the severity of a failure condition beyond what would be expected from the initial failure alone.
A structured argument, supported by a body of evidence, that provides a compelling, comprehensible, and valid case that a system is acceptably safe for a given application in a given operating environment. The safety case integrates all safety-related evidence — including safety analyses (FHA, PSSA, SSA), design data, test results, process evidence (development assurance), and operational considerations — into a coherent narrative demonstrating that safety objectives are met. The safety case concept is used explicitly in some regulatory frameworks and implicitly in others where the certification evidence package serves the same function.
Related Topics
The Big Standards Map
The core standards that form the spine of aviation certification — ARP4754B, ARP4761A, DO-178C, DO-254, DO-160G, and their European equivalents.
Safety Assessment Process
The complete safety assessment process for aviation — from Functional Hazard Assessment through System Safety Assessment, using ARP4761A methods.
Software Certification (DO-178C)
Airborne software certification under DO-178C — planning, requirements, verification, structural coverage, and certification liaison.
Need help navigating certification?
Understanding the terminology is the first step. If you need expert guidance on DO-178C, DO-254, ARP4754B, or any aspect of FAA, EASA, or TCCA certification, our team is here to help.