White Paper

The Pitfalls of Alarm Design and Benchmark Analysis

Les Jenson

Abstract

The noble goal of reducing the overload of process control system alarms has led to many approaches, some more worthy of the task than others.  The pressure to do 'something' about alarms is becoming so great that the theme has reached the level of corporate edict.  Yet additional pressures of time, budget and resource constraints can lead to poor decisions with unexpected results.  The search for a quick, prepackaged solution can even lead to a dangerous false sense of security.  The dangers can be further masked by faulty analysis of the results.  The basis of the alarm management approach taken and the predefined limitations to the approach are critical in determining success.  The pitfalls and illusory benefits of some alarm management approaches are highlighted.  Suggestions are offered on how to formulate an approach to analysis and solution development that is not limited by static preconceptions and artificial restrictions.

Introduction

This paper will approach the process of alarm management from the unique perspective of a dynamic optimization problem.  The premise is that there is no single static alarm configuration that is appropriate for all process operating states or state transitions.  Alarms should be configured to notify the operator of significant events.  On an initially intuitive level, event notification should evoke a necessary and appropriate action from the operator.  On a deeper level, the relevance of an event notification will depend on the current process operating state or state transition.  Therefore, every alarm is fundamentally dependent on the circumstances at the time.  Logically, both an alarm management process and real-time configuration management capability result from this approach.

Background

The topic of alarm management has, after many painful years of experience, finally become mainstream.  But in fact, alarms have been an increasing problem since the advent of digital distributed control systems (DCSs) in the 1970’s.  The conversion from pneumatic and analog systems with limited panel alarms opened theretofore unimaginable freedom to configure alarms.  The DCSs have since undergone an impressive evolution in the number and types of point alarms they can efficiently generate.

The integration of programmable logic controllers, safety instrumented systems, and packaged equipment controllers has been accompanied by an overwhelming increase in associated alarms.  By way of actual example, it is a rare instance when an operator can quickly ascertain the meaning, consequence, and reaction to an alarm point with a description of “BCD TO INT CONVERSION” or “BYPASS VLV CLOSED TO DCS” and an alarm state description of “ON.” 

The exploitation of alarm capability, while done with the best intentions, has produced the worst possible results.  The operator’s normal job description has, quite unfairly and without formal notice, been extended to the role of instrument system diagnostician.  This poor design and neglect have greatly increased the consternation of the operator and severely reduced the relevance of process alarms to the operator.  The consequence has been an increasing number of undesirable and potentially avoidable initiations or escalations of operating incidents.

Alarm Management Pitfalls & Solutions

Efforts to combat the cumulative affects of decades of neglect have greatly increased interest in the subject of alarm management.  The problems are large and complex, and the solutions are not simple.  The issues encompass everything from human capabilities to safety issues.  Intuitively alarms are safety related.  Yet, the true role of the operator as an element in any safety system is often glossed over.  For most alarm management practitioners, strapped for time and money, the process is reduced to an effort to ‘clean up’ the alarms.  However, the underlying approach philosophy, the tools and experience brought to bear, and the capabilities of the applied solution leave many hidden pitfalls.

A few alarm management process pitfalls and solutions are discussed below.  Every one of the topics is much larger than the space available in this paper.  The topics are in fact integral parts of a larger consistent approach to the process of alarm management.  The readers are encouraged to accept the information as only one input into their own thought processes and apply it as appropriate to their situation.  However, they should not lose sight of the inherent relationships and interdependencies among the individual topics presented.

#1 – Using words without clear meaning

Every alarm management process, from beginning to end, is highly dependent on the ability of the proponents and beneficiaries to communicate.  Without definitions, communications will be hampered at best and ineffective at worst.  It is not suggested that the definitions to follow are the only ones that make sense.  Rather they are proffered so that the reader can better understand what the author is communicating.  They are an attempt to reduce complex terms into simpler, less ambiguous terms.  The terms may not at first seem to require definition.  Experience has demonstrated the absolute necessity. 

An increasing body of literature uses these terms with variable, intuitive meanings.  At least it appears that way, but there’s no way to be sure.  Such a deficit will not be repeated here.  Some of the definitions will doubtlessly not match the reader’s initial intuitive definitions.  However, when taken as a whole, they represent a well seasoned set of definitions that are particularly useful for the expressed purpose.  The definitions are not organized in any particular order.  Terms in italics are also defined.

  • Alarm - An annunciated process condition to which the operator must and can take corrective action in order to return the process to normal operations.
  • Status - An unannunciated process condition which provides the operator with information on the operational state of the process.
  • Normal - That which is both planned and expected.
  • Abnormal - That which is either unplanned or unexpected, i.e., not normal.
  • Alarm Design - The process of establishing alarm criteria that are dependent upon the possible process states and state transitions.
  • Optimal Configuration - One wherein the performance characteristics of the process controls, alarms, and messages meet the currently required process operation objectives without the potential for interference with, or distraction of, the operator.
  • Alarm Quality - A measure of the relevance, including the timeliness, of an alarm.
  • Impossible - That which would violate a law of nature.

The value of these definitions applied to the alarm management process is easily demonstrated by example.  Consistent with the above definitions, a planned process unit startup or shutdown that begins and proceeds to completion on schedule and according to the plan should not begin with, generate, or end with a single alarm.  This is neither an undesirable nor an inherently impossible objective.

#2 – Using a suboptimal ‘static analysis’ strategy

Another of the most fundamental pitfalls and its solution has already been revealed under the introduction above.  Alarms cannot be effectively managed from a static perspective.  Further, dynamic management cannot be effectively added on after a static management process without seriously compromising any optimization goal. 

There are innumerable thought experiments that one can conduct to demonstrate the validity of this conjecture.  If the operator pushes a ‘stop’ button on a piece of equipment, is it appropriate for him to receive a feedback ‘stop’ alarm?  Or should he get an alarm if it doesn’t stop?  Should he get an alarm if the equipment stops unexpectedly?  Should he get an alarm if the equipment starts unexpectedly?  Should any ‘stop’ activate other potentially relevant alarms and inactivate other meaningless alarms?  When the above definitions are applied, the answers are immediately obvious. 

The ideal alarm management solution must extend itself beyond the normally static configuration capabilities of the DCS.  Pressure to compromise the ideal because of apparent limitations of the DCS is to be resisted.  The results of giving in will at best degrade the quality of the alarms and at worst mask valuable information.  Objections may be raised that it’s impossible to achieve the ideal.  It is not inherently impossible.  Limitations of the DCS with regard to any dynamic objectives must be regarded as challenges to be overcome rather than inevitable and insurmountable restrictions.  To the extent that they can be overcome on a generic and efficient basis, the alarm management process can capture the full optimization benefits.

#3 – Using a ‘minimal alarm’ strategy

There is an increasingly popular notion that the most practical and acceptable solution to the problems of alarms is to minimize the number of them.  This approach most likely has its genesis in the recognition of undesirable alarm floods during process upsets.  The full implementation of this solution is indeed simple – zero alarms.  This may just be a poor choice of words, since no one can believe that such a solution is valid.  However, the mindset is potentially damaging.

Realistically, the optimization of alarms on most existing DCSs will indeed result in fewer configured alarms.  But there is a significant difference between a result and an objective.  Alarm management is not a question of how many alarms are configured.  It is a question of the quality of the active alarms.  The quality of an alarm is a measure of its relevance.  If an alarm is not relevant then the quality is negative.  It may be possible to make the alarm more relevant through dynamic management techniques.  Otherwise the alarm could be a candidate to be eliminated, at least for the process state or state transition being analyzed.  But any relevant alarm must never be eliminated in an effort to reduce the number of configured alarms.

An optimal alarm management solution requires an acknowledgement of the dynamic nature of the problem and an understanding of what an alarm should be.  A proper alarm management process must lead to alarms that are relevant to the current situation.

#4 – Using A ‘RIGID RULEs’ strategy

An alarm management process with inflexible predefined rules belies an attempt to simplify and codify an inherently complex and necessarily painstaking process with a consequentially defective result.  Popular practices include broad general rules that may be developed for configuring alarms on characteristic types of instrumentation.  These rules may have some flexibility for exceptional circumstances.  However, the preconceived exceptions usually fall far short of the actual needs of a particular process.  This is most evident when the needs for a minimally instrumented vintage process are compared to a modern process.  Most damagingly, the rules seldom include the recognition of dynamic process states and state transitions.

Alarm management is not a question of what types of alarms are configured, or what their priorities are.  Again, it is ultimately the quality of the active alarms that should be at issue.  This cannot be easily codified.  Older process units may have few if any advanced safety systems while newer units will be fully outfitted.  In the former case, alarms may have to perform multiple duties and provide the operator with only a clue about the source of a problem.  In the latter case the alarms may be very specific to the underlying problem. 

That which is optimal for any type of instrument service in one case may not be so in another.  The analysis must take into account the number and interrelationships of control and electronic safety systems.

#5 – Using a ‘rigid metrics’ strategy

An alarm management process with predefined metric limits belies an attempt to restrict an inherently complex process with a consequentially defective result.  Popular metrics include the number of alarms with particular priorities, and maximum rates of alarms.  Even when offered as ‘guidelines,’ such generalizations can be recklessly applied.  Because of multiple factors that can influence a dynamically optimal solution, it is highly doubtful that any supporting study results can be reliably extended outside their specific study domain. 

However, most damaging is the misapplication of any metrics.  By way of thought experiment, how should a limit on the maximum number of emergency priority alarms be treated?  Should the restriction recognize the number of such alarms that are likely to appear simultaneously?  Should the restriction apply as an absolute configured limit, even though only one is likely to appear at any time?  How should any violation of these metrics be resolved?  The greatest potential ‘metric’ pitfall lies in the answers to these questions.

Metrics must never be used as a basis to compromise an alarm management process.  If at the end of a well designed, rational process, the results should violate some pre-established metric limits, then it is more appropriate to consider reallocation of operator responsibilities.  When applied to the above thought experiment, the erroneous reaction would be to sweep the violations under the rug by adjusting some alarm priorities.  There is an important distinction between a standard of measure used to assess the final results of a process and a justification to compromise that process. 

#6 – Using a ‘status alarm’ strategy

One of the greatest errors of point-based alarm management is failing to recognize the distinction between alarms and status information.  An alarm, by definition, requires a possible reaction.  Status information is the antithesis of an alarm.  This is true whether the information is analog or digital in nature.  Issuing status information as alarms clutters the operator’s alarm interface, distracts his attention, and predisposes him to ignore alarms.  This criticism is also valid for many messages.

The alarm interface should be restricted to alarms.  The objective is to use the alarm interface as an action item list.  An alternative mechanism must be found to present status information.  Since status information is characteristically something that is viewed on a demand basis, a schematic display is probably the most appropriate mechanism.

#7 – Using an ‘alarm configuration enforcement’ strategy

Whether under the guise of regulatory compliance or any other excuse, one of the greatest errors that can be made is the blind enforcement of any alarm configuration.  If the operator feels compelled to change the alarm configuration it should not be immediately viewed as a violation of safety or edict.  Rather the reason behind his action should be investigated.  The reason may be an indication that either the instrumentation requires maintenance, the operator requires training as to the relevance and appropriate response, or, lastly, that the alarm management process has either generated an irrelevant alarm or failed to generate a relevant alarm.  The notion that the originally rationalized configuration should be enforced and that the operator must adapt to that enforcement is the best indication of the failure of the alarm management process.

The implemented dynamic alarm management system should allow the operator some degree of control over alarms so he can eliminate the failed instrument and nuisance alarms.  However, the design of the system should not allow those alarms to be forgotten.  They must either be returned to service when appropriate or subjected to maintenance review.  Permanently dysfunctional alarm points need to be recycled through the alarm management process for elimination or possible substitution. 

In short, alarm management must itself be a living, dynamic process.  Any comparison of the expected state of the alarm configuration to the actual must respect day-to-day dynamic contingencies.

#8 – Using an ‘illusory assessment’ strategy

All too often the results of the alarm management process are reported and evaluated in terms of valueless statistics.  There are innumerable examples similar to “50% of the alarms were eliminated through the alarm management design process.”  But what is one to infer from such a statistic?  Is it appropriate to conclude that the next process incident, whatever it may be, will result in an alarm flood only half as large as would otherwise be expected?  Even if true, would it really be a significant improvement for an incident related alarm flood to be reduced from 500 to 250 alarms?

Other statistics summarize the number of alarms of this or that type or priority, and what percentage of the configured alarms they represent.  Does the distribution represent the statistical distribution that the operator can expect to see during every future incident?  Do these statistics reveal the one alarm that was eliminated that shouldn’t have been?  The one that will result in compounded losses during the next related incident?

An effective and meaningful assessment technique is to measure the impact of the results on ‘real-world’ performance.  This can best be accomplished by comparison of pre-management historical event alarm experience against a simulation of the post-management alarm configuration, or vice versa, or dual simulations with both pre and post-management configurations for cases where there isn’t any historical experience.  However, it is generally impractical to use apparently similar historical events from pre and post-management periods because the differences can be deceptively significant and render the comparison valueless. 

The best performance assessment technique would be to use a fully capable process simulator to conduct control tests with both the pre and post-management configurations using multiple operators under multiple scenarios.  However, the greatest value of such an exercise would come from recycling the results through the alarm management process to look for additional optimization opportunities.  The ultimate objective of the entire process is to improve the relevance, speed, and accuracy of the operator’s response.  The performance statistics that result from the proposed methods would be very valuable in this regard.

Summary

The common pitfalls to avoid, and the most valuable solutions to apply to an alarm management process would include every one of the above examples, as well as others.  They are all resolved through integrated concepts and practices that will benefit any effort at improving the quality of process alarms for the operators. 

The best solution to avoid pitfalls is to develop an alarm management process that allows the optimization of the alarms uniquely for each process operating state or state transition.  When coupled with the ability to achieve dynamic management of the alarm configuration, the optimum solution can become a reality.

References

Leslie D. Jensen, “Dynamic Alarm Management on an Ethylene Plant” Honeywell Users Group, Nice France 1995

Leslie D. Jensen, “Improving Alarm Management of Distributed Control Systems” The International Journal of Hydrocarbon Engineering, 1997 September

EEMUA, “Alarm Systems – A Guide to Design, Management and Procurement” EEMUA Publication 191; 1999 March

Tom Noble, “CPI Strive to Avoid Alarm Overload” CEP, 2001 March

Thomas R. Kindervater; Leslie D. Jensen, “Letters to the Editor: Alarm Overload” CEP, 2001 June

Acknowledgements

Dr. S. J. Kassarjian, P.E. and D. S. Beebe, P.E. for contributions and thought provoking questions.

Download Whitepaper