Image (Marc Frank )
Flying is safer than ever but 'one-off’ failures of complex systems still happen. Can resilience engineering help aircrews (and others) prepare for the unexpected? Captain DAVID MORIARTY (Chair of Human Factors (Ops) Group and author of Practical Human Factors for Pilots), GUNNAR STEINHARDT (Aviation psychologist), Captain MARC FRANk (CRM instructor at Luxair) and Captain ARTHUR DIJKSTRA (Consultant ADMC and Safety Investigator at KLM) provide an overview.
Take a moment and consider the number of individuals, machines, rules and lines of software code that allow your organisation to function. There may be thousands, tens of thousands or maybe millions of these individual components linked together in the startling complex system. Now ask, where does your organisational success come from and where might failure arise? Is it a single component? Is it many? Can a failure in one part of your system be contained successfully or could a disturbance cascade and cause widespread failure? The founder of Chaos Theory observed that a hurricane in Texas can be triggered by the beat of the butterfly’s wing in Brazil; a catastrophic event emerges from the unforeseeable consequences of a minor one.
The title of this article is coincidentally similar to the excellent piece written by Captain Richard de Crespigny in AEROSPACE in June 2015. A single engine component on his aircraft was manufactured with a wall that was too thin by a fraction of a millimetre, less than the thickness of a butterfly wing. The uncontained engine failure that followed was handled successfully as a result of the years of training and breadth of knowledge of the crew on board. It was also a successful outcome because of support from outside the aircraft and from the integrity of the aircraft itself. Who knew that that part would fail and who knew that that support structure, that knowledge, that training, would one day contribute to the successful outcome of this event? We need some way to manage this kind of unpredictability.
Resilience in theory
Every once in a while, we take a step forward in our understanding of safety in complex systems. In the 1930s, accidents were described using the metaphor of a line of dominoes; one negative event causes another, and then another until the accident occurs (Figure 1). In the 1990s, James Reason moved beyond this active description to a more passive model, one that describes the evolution of failure in a system as the unanticipated alignment of weaknesses across the organisation (Figure 2). While this was a major advance, the reasoning behind it is still linear; the dominoes fall, the holes line up and the path is open to failure.
The drawback of both of these models is that they consider a failure event and then try and describe its origins and its evolution. This snapshot of an organisation is like peeking though the keyhole of a ship’s engine room. By shifting our gaze, we may be able to trace the fuel lines from the tanks to the motor and the drive shaft from the motor to the propeller but we can’t see all the associated machinery, engineering support, training and procedures that are all needed to allow the ship move forward. To manage safety better, we need to start with a broader view of how things really work day-to-day. Only then can we begin to engineer our systems to become more resilient.
Resilience engineering (RE) emerged as a scientific discipline about ten years ago when researchers realised that many accidents couldn’t be explained by single-point failures. They were actually system failures and to understand them, we needed to better understand the systems themselves. Some of the key RE principles are:
- Complex systems are made up of many linked components (Figure 3).
- Complex systems behave in unpredictable ways.
- How the managers of the system expect things to be done (‘work-as-imagined’) is often very different from how things are actually done (‘work-as-done’).
- You cannot make a system safer by trying to come up with rules for every eventuality.
- The resilience of your system is most likely to come from those components that have the greatest capacity to adapt to changing conditions i.e. the people.
- Rather than focusing on when things go wrong, we should be looking at when things go right i.e. when components within the system successfully manage disturbances to come out with a successful result.
Simply put, resilience is the ability of a system to continue to function by managing disturbances. RE principles are not limited to individuals but can be adopted across the organisation. As with so much in the world of aeronautical safety, the process starts in the classroom. In October, the rules governing how airlines teach Crew Resource Management (CRM) are changing and RE now forms part of the new syllabus. This article gives two examples of how airlines have implemented these principles in training and in practice.
Resilience in training
Luxair decided to implement RE into its CRM training in 2013. (Marc Frank)
In 2013, the Luxair Human Factors Training Team decided to implement RE into its CRM training programme. Because of the importance of the topic, the team decided to introduce and develop the training over several years. It had to start with theory. The book Resilience Engineering in Practice is a seminal work in the field of RE. Implementation of new theory into practice can be a challenge and so the first step was to derive some easy to use intuitive principles. The team designed a CRM course that explained RE with special focus on ‘noticing change in risk profile’ and ‘mitigating threats’. The instructors felt that it was important to link resilience to the notion of change such as changes in the environment or changes in the required plan of action due to expected or unexpected events. To achieve this, the team developed a definition of resilience that would be the starting point to explain it as a principle: the ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions.
A useful training aid was a video demonstrating the flexibility of memory metal eyeglass frames: twist and bend the frames and they will bounce back and still hold the lenses in place. This was related to the role of air crew. Crew are trained to accommodate changes and absorb disturbances without overload or catastrophic failure. This idea of a system being prepared for disturbance, of looking for changes, of sustaining operations as long as the event persists and then of restructuring based on the experience embody the four cornerstones of resilience: the ability to anticipate, to monitor, to respond and to learn.
To make these steps more tangible the Luxair CRM team created a tool, specifically an acronym, ‘CABL3’:
A – Trigger for the tool – Notice risk profile change!
C – Communicate – Is your team aware about the change?
A – Anticipate – How could this change influence the task?
B – Create buffers – Can you consider any alternative plan of action?
L3 – Look for indicators which could make your system fail (from Resilience Engineering in Practice)
2. Working at cross purposes?
3. Getting stuck in outdated behaviours?
The training was part of a full day CRM course. As well as introducing the theory, the team used the Air India Express Flight 812 accident at Mangalore Airport in 2010 to show how application of RE principles could have changed the outcome of the situation. Initially, pilots and cabin crew were very enthusiastic about the concept of RE but as time passed, motivation slowly diminished. To mitigate this effect, the team reinforced the idea of resilience by linking it to other CRM topics e.g. situation awareness, leadership and decision making during subsequent annual recurrent courses. Luxair also integrated resilience into their pilot competencies framework by highlighting the behavioural markers of resilience.
As a second step the Luxair team decided in this year’s course to look at the case of an unstabilised approach from an unorthodox perspective. Rather than looking at the outcome and working backwards, the instructors aim to get course participants to use the RE principles that they have learned during their previous courses to identify the subtle precursors that contributed to the negative outcome.
The third step of the training plan is the integration of resilience training into line oriented flight training sessions in the simulator, the aim being to generate appropriate situations which will sensitise pilots to the idea of subtle change being a motivator to look for and mitigate new threats. The full training material is available on the HFG (Ops) and REA websites (see Resources).
Resilience in practice
FlightStory acts as a social media version of traditional ‘I Learnt About Flying From That’ style columns to share experiences, spot patterns and build up data.
Another airline has started the process of adopting RE principles across its organisation by introducing a new tool into its safety management system with the aim of developing resilience. The starting point is the fact that an effective Safety Management System requires high-quality feedback from flight operations. If RE tells us that we should be interested in what goes right as well as what goes wrong, traditional data gathering channels used by an airline SMS will miss out on a wealth of information. Information about how crews successfully mitigated operational risks can provide greater insights than information about where these strategies weren’t successful. FlightStory is a computer-based system that allows an organisation to learn how actual day-to-day crew performance helps deliver safe outcomes even when crew are faced with complex, dynamic situations.
The pilots have access to a FlightStory app on their iPad to submit their stories. Behavioural markers associated with resilience are used to find patterns in effective handling of all types of events, not only safety incidents. The stories show how uncertainty, ambiguity and complexity can play a role in normal pilot event handling and how resilience can be developed. FlightStories can be shared between pilots and sent to line and safety management taking into account confidentiality and privacy. This supports learning across different organisational layers.
The FlightStory form consists of three parts. The first part starts by asking the pilot an open question such as ‘please describe your experience in a way other pilots can learn from your event’. Here the pilot provides their narrative of the event to which they can add keywords, a title or a summary of what lesson was learned. Titles have included ‘50 shades of grey’, ‘Insufficient wingtip clearance during taxi-out?’, ‘Always Be Prepared’, ‘Acceptability vs accountability’. The first part also asks about the emotional impact of the event on the pilot, this being an important indicator for the potential utility of the event as a learning tool. The pilot is also asked to assign a personal judgement to the risk level of this event. This allows comparison of SMS risk assessment done by the safety office and that of the pilots.
The form has ten scales covering a mix of relevant resilience and CRM concepts which provides a way for the pilot to express their view on the event. Now it is not a safety officer but the pilot himself who is rating and classifying it. As an example, there is a question about operational support to handle the situation. The option is to answer in any combination of: Standard Operating Procedures, advice from others such as flight dispatch, maintenance support or improvisation. Most of the received stories showed that a considerable amount of improvisation was used to deal with the situation. The resulting discussion with management can reveal a gap between work-as-done in actual flight operations and work-as-imagined by managers. Finally, there is scale for assessing the support given by Operational Performance Conditions, factors that are managed by the airline organisation through the SMS that shape the performance of flight operations.
The experience of this airline shows that FlightStory delivers richer data than standard reporting forms and thereby generates strategies for creating resilience in practice.
Resilience gives pilots the confidence to tackle the unexpected in highly-automated, ultra reliable aircraft. (Airbus)
We are the first generation of crew to be able to integrate RE principles into our operational behaviour. However, the concepts can be difficult to grasp at first and it requires commitment on the part of operators, dedication on the part of instructors and open-mindedness on the part of crew. Such significant changes in how we view safety don’t come along very often and, as we did during the advent of human factors, aviation should be in the vanguard of putting resilience into practice to make our industry safer and more efficient.
- RAeS Human Factors in Flight Operations and Training (HFG[Ops]) website: http://www.raes-hfg.com/hfg-ops-membership/