Reliability Engineers
Reliability Engineers, equipment engineers, area engineers, process engineers and
other plant engineer titles are responsible for solving complex problems that are
beyond the scope of plant operators, mechanics or electrician. Tangoâ„¢ provides equipment
lifecycle and root cause failure analysis tools to assist these engineers in identifying,
trouble shooting and solving expensive and complex reliability problems.
The Reliability Engineer
The reliability engineer is the single point of authority and responsibility for
assuring the reliability and maintenance action for assigned components or areas.
The reliability engineer must be one of the most knowledgeable people in the plant
for his assigned responsibility, usually with many years of experience in the installation,
repair, and operation of the assigned component. Because the reliability engineer’s
position requires a strong ability to coordinate action within the plant and with
the repair facilities, it is critical that the reliability engineer have strong
"people skills" as well as the technical skills.
From the viewpoint of an outside observer watching the reliability engineer in action,
the amount of authority and responsibility given to this position is very striking.
The reliability engineer must act as reliability coordinator, purchasing agent,
repair coordinator, and liaison between maintenance, production, PdM, and engineering.
Typical reliability engineering tasks include:
- Repair and spares coordination by working with repair facilities. Assure that the plant has the correct component replacements such that any impact of a component replacement causes minimum production loss.
- Reliability Improvement by evaluation of root cause issues and improving countermeasures if found inadequate. Maintenance protocols determined by lowest life cycle cost.
- Coordination of predictive maintenance action by reviewing PdM data and assuring that countermeasure actions are appropriate and on track. Make sure condition information is available to the entire plant and action is underway such that the impact on production is minimized.
- Inspect and review preventative and scheduled maintenance work. Inspect critical components during the operation.
The responsibilities of the reliability engineer include:
- Primary contact and relationship with the repair facilities.
- Specification of new or replacement components.
- Managing the component spares inventory.
- Supervising the routine maintenance and testing programs.
- Reviewing all failures, seeks root cause of failures, and specifies repairs, modifications, or replacements that eliminate or minimize the root cause of failure.
- Specifying and purchasing (possibly works very closely with purchasing)with the goal of lowest lifecycle cost.
- Maintaining a close working relationship with PdM staff and techs.
- Visits repair facilities regularly, and know capabilities and latest repair techniques.
- Visits manufacturers regularly and knows their capabilities and practices. Gives feedback on suspect practices and procedures.
Using all available reliability and asset information, the reliability engineer
must balance the equipment out for repair, the spares inventory, and all new equipment
purchases to assure minimum impact on plant production, enhanced reliability, and
the lowest life cycle cost. To accomplish this, the reliability engineer must know
what component models and what spare parts will be required by the plant over the
next few months; he then must make sure the appropriate units are on hand with the
proper tools, parts, and procedures for a routinely managed replacement. The tools
used by the reliability engineer to accomplish this Herculean effort are predictive
condition, historical location failure patterns, and an optimized spares inventory.
A good working relationship with component manufacturers and repair shops is required
to ensure the highest quality of reworked and new components are available and delivered
on time.
When a component begins showing signs of condition degradation, the reliability
engineer must go to work well ahead of the failure. He must explore the following
questions:
1. Does a pattern of failure in this component or location indicate an underlying
root cause of failure?
If the answer to this question is "yes", the reliability engineer must seek
a new solution to replace the component with a new or overhauled unit that is modified
to defeat the suspected root cause.
2. Should overhaul or replacement be selected for this component?
How many times has the unit been overhauled? What have previous life cycles been
like? Is the amortized cost of a new component with a longer life and higher purchase
cost better than an overhaul with it’s lower cost but shorter life? These decisions
must be made on a case-by-case basis using hard historical evidence of new and overhaul
cost and new and overhaul life cycles, and component history.
3. Does a spare exist on-site?
If the answer is yes, it must be checked to assure it is ready for service. If the
answer is no, then the spares inventory of other company facilities, used component
vendors, and new component vendors must be searched to find a suitable replacement.
The reliability engineer must assure that an inventory of critical spares is available
and optimal, such as the same general specification of 1800 RPM, 200 HP, 440v, and
445T frame components will meet the needs of the plant.
4. Are there any installation or maintenance issues to address at replacement
to make the component easier to install/remove or maintain over its life?
Items such as access, hostile conditions, and poor design may make the installation,
removal, or maintenance task very difficult. In these situations, a review of ways
to change or minimize the situation should be conducted.
Identifying Root Cause of Failure from Historical Analysis of Machine Failures
There is extensive information existing in many plants that, if analyzed for repetitive
patterns, would yield significant insight into machines or applications with a chronic
history of poor reliability.
Tracking failure data for a component in the plant along with the manufacturer and
overhaul shop, the overhaul details and the location where the component was operating
will yield a wealth of reliability information when sorted and analyzed. The reliability
of components can be studied by manufacturers to determine which brand component
is most reliable by size or application. It can be determined which rebuild shop
should do which rebuilds. One can also find out where redesign of the component
or application is necessary to improve the reliability. Sometimes replacement is
better than repair and vice versa.
Machine and location history must be maintained such that the history can be searched
and sorted to reveal basic failure patterns such as:
1. Failures by location
Often a plant has multiple identical equipment trains. It is important to understand
if there is a common primary failure mode across these locations and seek to eliminate
the root cause of failure if the life cycle of a component is less than acceptable.
2. Failures under warranty
Often plants do not process equipment warranty claims because they do not have readily
available information on purchase and installation dates.
3. Failures with less than one year of service
There exists a significant number of unreliable machines or locations within the
plant that must be identified and aggressively analyzed for root cause of failure.
4. Failures by manufacturer
Some manufacturers build a better machine for an application than others. An analysis
of the installed machines by manufacturer can focus your replacement towards higher
reliability manufacturers.
5. In service life
Sometimes a specific asset has a reliability problem which needs to be identified
and eliminated. Analysis of the in-service life by asset ID can identify these weak
components. Another value of in-service life analysis is to examine the life available
from each overhaul cycle turn. It is often said that a motor should not be overhauled
more than four times. By viewing the in-service life of the motor for each overhaul,
a decision on additional overhaul or scrapping the motor can be made.
It is possible, in many cases, to go back into paper records (purchase orders, machine
logs, maintenance logs) to put together the historical facts needed to support the
correct replacement/repair action today. The history gathered will assist in root
cause of failure analysis, and the redesign and/or purchase and/or maintenance specifications
that result from this will produce the greatest payback from the program.
Integrating Condition-Based Maintenance into the Plant Maintenance Organization
Condition-based maintenance, or predictive maintenance, is a very technical science
with data acquisition devices, databases, and analysis graphs and charts; all oriented
at predicting one aspect of a machine’s condition. This technology has proven to
be very valuable in identifying failing machines and allowing replacement before
catastrophic failure. Where most predictive maintenance programs fall short is in
integrating the predictive technologies into one concise condition view of the machine
and providing integrated recommendations for machine maintenance to the entire plant,
and tying the maintenance action back to the original PdM calls.
Integrating predictive recommendations from multiple technologies.
In many plants, predictive maintenance has not been well integrated into the overall
plant maintenance organization. Strategic Component Management utilizes the reliability
engineer to provide the link between the technical data oriented results of PdM
and the action based recommendations that are needed by plant maintenance and operations
people.
Strategic Component Management requires that the results of the predictive technologies
be integrated and an executive recommendation be issued in clear maintenance language
regarding the machine’s problem severity, expected availability, and corrected action.
The reliability engineer should be the person responsible for issuing the component’s
recommended repair action.
Integrating predictive recommendations from multiple technologies
Let’s say that each technology analyst is making great recommendations for action
based on their area of expertise. This is still not adequate for good maintenance
decisions. Plant maintenance and operations people cannot easily work with data,
actions and recommendations issued by each independent technology. Rather, they
need a single integrated recommendation that provides information about the machine
and it’s condition in a clear, concise manner.
Converting PdM technology data into action-based recommendations is not an exact
science. The common question "When will it fail" cannot be accurately answered because
of the many unknown aspects of machine failures. Strategic Component Management
utilizes the color code severity index where the action based on severity is clear.
These are:
Red - Machine is not reliable and could fail at any time.
Yellow - Machine is definitely degraded, but should reliably operate
for over a month longer.
Blue - An early stage fault is suspected. Corrective action may
be taken to extend the machine’s life. The machine should reliably operate for over
three months.
Green - The machine is fully reliable and shows no sign of degradation.
Integrated Condition Status Report
Benefits & Payback
The payback from reliability engineering can be very significant. Typically, payback
is obtained from:
1. Reduced machinery life cycle cost
The most significant cost savings from this technology is the lowering of a machine’s
life cycle cost by increasing it’s usable life between replacements or overhauls.
Many poor reliability machines can be increased in life several-fold with the application
of root cause failure analysis and precision corrective maintenance technologies.
2. Lower repair/replacement cost
The reduction in overhaul and replacement cost is accomplished through building
a partnership with the overhaul or component manufacturers; which allows volume
cost reductions and the imposition of stringent quality requirements. The standardization
of spares to allow more widespread use of a single class of motor provides significant
savings.
One large facility reported a 40% reduction on repair and replacement cost when
comparing their annual cost from before implementing strategic component management
and three years after the implementation of the program.
3. Increased production from higher availability
Increasing machinery reliability has a direct effect on increasing production. Having
more reliable machines means less maintenance downtime and more production availability.
One plant, after having implemented Strategic Component Management for several years
notes an increase of up to 8 hours a week of increased production.
4. Providing integrated recommendations to the plant
In many manufacturing facilities, predictive condition monitoring is performed on
thousands of machine components. Typically an exception report of machines ranked
by severity is issued by each technology. Usually this report is issued on hard
copy or in e-mail and followed up with a phone call for the most critical and severe
cases.
Typically in most plants the maintenance responsibility is broken down into areas
or component types. Area maintenance only wants to see the condition of "their"
machines. Strategic Component Management recommends that an integrated condition
report be available broadly over the plant computer network, and allow each maintenance
person to only view the machines they are interested in or responsible for. Figure
1. shows an example integrated condition report.
5. Completing the loop - tying maintenance action back to the PdM call.
It is typical in most plants that the predictive maintenance technologists do not
get very good feedback on repairs and findings from the maintenance repair organization.
PdM needs this information to:
- Close out the fault case.
- Verify the quality of predictive calls.
- Trigger the collection of certification measurements to assure a good repair.
- Establish a new machine baseline.
6. Component Replacement & In-Place Spares
Major component replacement and in-place repairs are led by the reliability engineer.
This reduces the stress and workload of area maintenance management, planning, and
parts expeditors. This ensures a top quality job completed on-schedule.