Reliability Engineers


Reliability Engineers, equipment engineers, area engineers, process engineers and other plant engineer titles are responsible for solving complex problems that are beyond the scope of plant operators, mechanics or electrician. Tangoâ„¢ provides equipment lifecycle and root cause failure analysis tools to assist these engineers in identifying, trouble shooting and solving expensive and complex reliability problems.



The Reliability Engineer

The reliability engineer is the single point of authority and responsibility for assuring the reliability and maintenance action for assigned components or areas.

The reliability engineer must be one of the most knowledgeable people in the plant for his assigned responsibility, usually with many years of experience in the installation, repair, and operation of the assigned component. Because the reliability engineer’s position requires a strong ability to coordinate action within the plant and with the repair facilities, it is critical that the reliability engineer have strong "people skills" as well as the technical skills.

From the viewpoint of an outside observer watching the reliability engineer in action, the amount of authority and responsibility given to this position is very striking. The reliability engineer must act as reliability coordinator, purchasing agent, repair coordinator, and liaison between maintenance, production, PdM, and engineering. Typical reliability engineering tasks include:

  • Repair and spares coordination by working with repair facilities. Assure that the plant has the correct component replacements such that any impact of a component replacement causes minimum production loss.
  • Reliability Improvement by evaluation of root cause issues and improving countermeasures if found inadequate. Maintenance protocols determined by lowest life cycle cost.
  • Coordination of predictive maintenance action by reviewing PdM data and assuring that countermeasure actions are appropriate and on track. Make sure condition information is available to the entire plant and action is underway such that the impact on production is minimized.
  • Inspect and review preventative and scheduled maintenance work. Inspect critical components during the operation.
The responsibilities of the reliability engineer include:
  • Primary contact and relationship with the repair facilities.
  • Specification of new or replacement components.
  • Managing the component spares inventory.
  • Supervising the routine maintenance and testing programs.
  • Reviewing all failures, seeks root cause of failures, and specifies repairs, modifications, or replacements that eliminate or minimize the root cause of failure.
  • Specifying and purchasing (possibly works very closely with purchasing)with the goal of lowest lifecycle cost.
  • Maintaining a close working relationship with PdM staff and techs.
  • Visits repair facilities regularly, and know capabilities and latest repair techniques.
  • Visits manufacturers regularly and knows their capabilities and practices. Gives feedback on suspect practices and procedures.
Using all available reliability and asset information, the reliability engineer must balance the equipment out for repair, the spares inventory, and all new equipment purchases to assure minimum impact on plant production, enhanced reliability, and the lowest life cycle cost. To accomplish this, the reliability engineer must know what component models and what spare parts will be required by the plant over the next few months; he then must make sure the appropriate units are on hand with the proper tools, parts, and procedures for a routinely managed replacement. The tools used by the reliability engineer to accomplish this Herculean effort are predictive condition, historical location failure patterns, and an optimized spares inventory. A good working relationship with component manufacturers and repair shops is required to ensure the highest quality of reworked and new components are available and delivered on time.

When a component begins showing signs of condition degradation, the reliability engineer must go to work well ahead of the failure. He must explore the following questions:

1. Does a pattern of failure in this component or location indicate an underlying root cause of failure?
If the answer to this question is "yes", the reliability engineer must seek a new solution to replace the component with a new or overhauled unit that is modified to defeat the suspected root cause.

2. Should overhaul or replacement be selected for this component?
How many times has the unit been overhauled? What have previous life cycles been like? Is the amortized cost of a new component with a longer life and higher purchase cost better than an overhaul with it’s lower cost but shorter life? These decisions must be made on a case-by-case basis using hard historical evidence of new and overhaul cost and new and overhaul life cycles, and component history.

3. Does a spare exist on-site?
If the answer is yes, it must be checked to assure it is ready for service. If the answer is no, then the spares inventory of other company facilities, used component vendors, and new component vendors must be searched to find a suitable replacement. The reliability engineer must assure that an inventory of critical spares is available and optimal, such as the same general specification of 1800 RPM, 200 HP, 440v, and 445T frame components will meet the needs of the plant.

4. Are there any installation or maintenance issues to address at replacement to make the component easier to install/remove or maintain over its life?
Items such as access, hostile conditions, and poor design may make the installation, removal, or maintenance task very difficult. In these situations, a review of ways to change or minimize the situation should be conducted.



Identifying Root Cause of Failure from Historical Analysis of Machine Failures

There is extensive information existing in many plants that, if analyzed for repetitive patterns, would yield significant insight into machines or applications with a chronic history of poor reliability.

Tracking failure data for a component in the plant along with the manufacturer and overhaul shop, the overhaul details and the location where the component was operating will yield a wealth of reliability information when sorted and analyzed. The reliability of components can be studied by manufacturers to determine which brand component is most reliable by size or application. It can be determined which rebuild shop should do which rebuilds. One can also find out where redesign of the component or application is necessary to improve the reliability. Sometimes replacement is better than repair and vice versa.

Machine and location history must be maintained such that the history can be searched and sorted to reveal basic failure patterns such as:

1. Failures by location
Often a plant has multiple identical equipment trains. It is important to understand if there is a common primary failure mode across these locations and seek to eliminate the root cause of failure if the life cycle of a component is less than acceptable.

2. Failures under warranty
Often plants do not process equipment warranty claims because they do not have readily available information on purchase and installation dates.

3. Failures with less than one year of service
There exists a significant number of unreliable machines or locations within the plant that must be identified and aggressively analyzed for root cause of failure.



4. Failures by manufacturer
Some manufacturers build a better machine for an application than others. An analysis of the installed machines by manufacturer can focus your replacement towards higher reliability manufacturers.

5. In service life
Sometimes a specific asset has a reliability problem which needs to be identified and eliminated. Analysis of the in-service life by asset ID can identify these weak components. Another value of in-service life analysis is to examine the life available from each overhaul cycle turn. It is often said that a motor should not be overhauled more than four times. By viewing the in-service life of the motor for each overhaul, a decision on additional overhaul or scrapping the motor can be made.







It is possible, in many cases, to go back into paper records (purchase orders, machine logs, maintenance logs) to put together the historical facts needed to support the correct replacement/repair action today. The history gathered will assist in root cause of failure analysis, and the redesign and/or purchase and/or maintenance specifications that result from this will produce the greatest payback from the program.



Integrating Condition-Based Maintenance into the Plant Maintenance Organization

Condition-based maintenance, or predictive maintenance, is a very technical science with data acquisition devices, databases, and analysis graphs and charts; all oriented at predicting one aspect of a machine’s condition. This technology has proven to be very valuable in identifying failing machines and allowing replacement before catastrophic failure. Where most predictive maintenance programs fall short is in integrating the predictive technologies into one concise condition view of the machine and providing integrated recommendations for machine maintenance to the entire plant, and tying the maintenance action back to the original PdM calls.



Integrating predictive recommendations from multiple technologies.

In many plants, predictive maintenance has not been well integrated into the overall plant maintenance organization. Strategic Component Management utilizes the reliability engineer to provide the link between the technical data oriented results of PdM and the action based recommendations that are needed by plant maintenance and operations people.

Strategic Component Management requires that the results of the predictive technologies be integrated and an executive recommendation be issued in clear maintenance language regarding the machine’s problem severity, expected availability, and corrected action. The reliability engineer should be the person responsible for issuing the component’s recommended repair action.



Integrating predictive recommendations from multiple technologies

Let’s say that each technology analyst is making great recommendations for action based on their area of expertise. This is still not adequate for good maintenance decisions. Plant maintenance and operations people cannot easily work with data, actions and recommendations issued by each independent technology. Rather, they need a single integrated recommendation that provides information about the machine and it’s condition in a clear, concise manner.

Converting PdM technology data into action-based recommendations is not an exact science. The common question "When will it fail" cannot be accurately answered because of the many unknown aspects of machine failures. Strategic Component Management utilizes the color code severity index where the action based on severity is clear. These are:

Red - Machine is not reliable and could fail at any time.
Yellow - Machine is definitely degraded, but should reliably operate for over a month longer.
Blue - An early stage fault is suspected. Corrective action may be taken to extend the machine’s life. The machine should reliably operate for over three months.
Green - The machine is fully reliable and shows no sign of degradation.


Integrated Condition Status Report




Benefits & Payback

The payback from reliability engineering can be very significant. Typically, payback is obtained from:

1. Reduced machinery life cycle cost

The most significant cost savings from this technology is the lowering of a machine’s life cycle cost by increasing it’s usable life between replacements or overhauls. Many poor reliability machines can be increased in life several-fold with the application of root cause failure analysis and precision corrective maintenance technologies.

2. Lower repair/replacement cost

The reduction in overhaul and replacement cost is accomplished through building a partnership with the overhaul or component manufacturers; which allows volume cost reductions and the imposition of stringent quality requirements. The standardization of spares to allow more widespread use of a single class of motor provides significant savings.

One large facility reported a 40% reduction on repair and replacement cost when comparing their annual cost from before implementing strategic component management and three years after the implementation of the program.

3. Increased production from higher availability

Increasing machinery reliability has a direct effect on increasing production. Having more reliable machines means less maintenance downtime and more production availability.

One plant, after having implemented Strategic Component Management for several years notes an increase of up to 8 hours a week of increased production.

4. Providing integrated recommendations to the plant

In many manufacturing facilities, predictive condition monitoring is performed on thousands of machine components. Typically an exception report of machines ranked by severity is issued by each technology. Usually this report is issued on hard copy or in e-mail and followed up with a phone call for the most critical and severe cases.

Typically in most plants the maintenance responsibility is broken down into areas or component types. Area maintenance only wants to see the condition of "their" machines. Strategic Component Management recommends that an integrated condition report be available broadly over the plant computer network, and allow each maintenance person to only view the machines they are interested in or responsible for. Figure 1. shows an example integrated condition report.

5. Completing the loop - tying maintenance action back to the PdM call.

It is typical in most plants that the predictive maintenance technologists do not get very good feedback on repairs and findings from the maintenance repair organization. PdM needs this information to:

- Close out the fault case.
- Verify the quality of predictive calls.
- Trigger the collection of certification measurements to assure a good repair.
- Establish a new machine baseline.

6. Component Replacement & In-Place Spares

Major component replacement and in-place repairs are led by the reliability engineer. This reduces the stress and workload of area maintenance management, planning, and parts expeditors. This ensures a top quality job completed on-schedule.