Aerospace Electronics Reliability must be Quantified to be Assured: Application of the Probabilistic Design for Reliability Concept
E. Suhir*
Bell Laboratories, Murray Hill, NJ, USA (ret); Portland State University, Portland, OR, USA.
Technical University, Vienna, Austria; James Cook University, Queensland, Australia.
ERS Co., 727 Alvina Ct., Los Altos, CA 94024, USA.
*Corresponding Author
E. Suhir,
Bell Laboratories, Murray Hill, NJ, USA (ret); Portland State University, Portland, OR, USA,
Technical University, Vienna, Austria; James Cook University, Queensland, Australia; and
ERS Co., 727 Alvina Ct., Los Altos, CA 94024, USA
Tel: 650.969.1530
Email: suhire@aol.com/e.suhir@ieee.org
Received: September 25, 2020; Accepted: October 08, 2020; Published: November 30, 2020
Citation:E. Suhir. Aerospace Electronics Reliability must be Quantified to be Assured: Application of the Probabilistic Design for Reliability Concept. Int J Aeronautics Aerospace Res. 2020;7(3):235-243. doi: dx.doi.org/10.19070/2470-4415-2000029
Copyright: E. Suhir© 2020. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
Abstract
The recently suggested probability design for reliability (PDfR) concept can be effectively used for making a viable electronic,
optoelectronic, photonic, or MEMS device (EOPD) into a reliable product. Understanding the physics-of-failure is critical
to the assurance of the EOPD reliability, and the PDfR concept has therefore its experimental basis in the highly focused
and highly cost-effective failure oriented accelerated testing (FOAT) geared to a physically meaningful and trustworthy predictive
model that can be used for the prediction of the remaining useful lifetime (RUL) in actual operation conditions from
the FOAT data. The multi-parametric Boltzmann-Arrhenius-Zhurkov (BAZ) equation suggested about a decade ago could
be employed in this capacity. We focus in this analysis on some major features of the PDfR and its possible interactions with
statistical approaches, when BAZ model is sandwiched between two well known statistical approaches, Bayes formula and
beta-distribution. It is concluded that the application of the PDfR concept, FOAT and the multi-parametric BAZ model
enables improving dramatically the state-of-the-art in the field of the aerospace EOPD reliability prediction and assurance.
The general concepts are illustrated by numerical examples.
2.Introduction
3.PDfR Concept and Its Applications
4.Numerical Examples
5.Acronyms
6.References
Keywords
Aerospace Electronics; Reliability; Probabilistic Modeling; Accelerated-Testing.
Introduction
Here are some current problems envisioned and questions asked
in connection with assuring aerospace EOPDs reliability:
• EOPD products, which passed the existing qualification tests
(QT), often exhibit nevertheless premature operational failures.
Are the existing QT standards, methodologies, procedures and
practices adequate [1]?
• If they are not, what could and should be done differently in the
next generation of QT [2]?
• While the failure of a commercial electronic product is usually
not viewed as a catastrophe, as long as the percentage of failed
devices is low and the product is still sellable, in aerospace EOPDs the
consequences of device failure might be dramatic, sometime even
more severe than in some other areas of EOPD engineering,
such as military, long-haul communications, medical, etc. Should
the likelihood of failure of theEOPDs, whose operational reliability
is critical, be necessarily quantified to be assured [3-5]?
• And because nothing is perfect, and the difference between a
highly reliable and insufficiently reliable EOPDs is “merely” in the
level of the never-zero probability of failure, what probabilistic
means should be employed to quantify and assure EOPDs reliability
[6-11]?
• Every five-seven years or so a new generation of EOPDs is being
developed. Old ED products become obsolete, although they
are still physically reliable. Should aerospace EOPDs manufacturers,
after the acceptable probability of the never-zero likelihood of
failure is established and agreed upon, consider, at the design and
production stages, relatively short, but realistic and predictable,
lifetimes of their products [12]? As a friend of mine has put it, “I
do not need an expensive everlasting pen, because I do not intend
to live forever”.
• The reliability of an EOPDs product should be different for different
products and applications [13]. Should this circumstance be
considered when planning and evaluating the product’s lifetime
and the adequate probability of failure?
• How to establish the appropriate list of the crucial (AT), the
physically meaningful and particular-application oriented stressors
(stimuli), and, having in mind that the principle of superposition
is not applicable in reliability engineering, their relevant combinations
[14]?
• And should these combinations necessarily reflect those that the
device will encounter in actual operation [15, 16]?
• There is currently a lot of criticism about the adequacy of the
widespread temperature cycling as the most preferable AT approach
in EOPDs reliability engineering. Temperature cycling tests
are not only costly, time consuming and require expensive and
sophisticated equipment to be conducted, but, most importantly,
their results might be misleading, since the temperature range
in these ATs has to be much broader than what an EOPDs might
encounter in actual operation conditions, and the EOPDs materials
behavior is, as is known, very much temperature sensitive, and
might be quite different therefore at very high and very low testing
temperatures, than at moderate temperatures in actual operation
conditions. Should the temperature cycling tests for EOPDs be
replaced by, say, low-temperature/random-vibrations bias, or by
another more physically meaningful and, perhaps, less expensive
and less labor and time consuming tests [17]?
• Although mechanical pre-stressing of the accelerated life test
(ALT) specimens [18, 19] could minimize the above shortcoming,
such a pre-stressing, acceptable as a research effort, could
hardly be recommended in actual industrial practice. Could, e.g.,
the indicated above combination of low temperature conditions
(because the thermally induced stresses in an ED fabricated at
an elevated temperature and subsequently cooled down to a low,
room or testing, temperature are the highest at low temperature
conditions and also because fatigue and brittle cracks propagate
more rapidly at low temperature conditions) and random vibrations,
be employed as a suitable combination of loadings that
could be employed as an appropriate QT technique?
• Could such testing be employed also as a suitable burn-in test
(BIT) that, in addition, would be able to weed out infant mortality
failures [20-23]?
• The experimental bathtub curve (BTC) is the “reliability passport”
of an aerospace EOPDs. It is well known that there are two
major irreversible random processes that form such a curve for
mass-produced devices: the statistics-related-failure process that
results in the decreased failure rate with time (the BTC’s infant
mortality portion reflects the statistical nature of such a process)
and reliability-physics-related-failure process that results in the increased
failure rate with time (the wear-out portion of the bathtub
curve explicitly reflects the ultimate physics of this process). The
decreased and the increased failure rates caused by these two processes
result in a more or less constant failure rate at the steadystate
portion of the BTC. If one sets out to improve the reliability
and increase the RUL of the EOPDs, it is clear that he/she should
focus on the physics-of-failure process, especially at the wear-out
portion of the BTC. But how to separate this process from the
statistical process? It has been shown [24, 25] that the statistical
process can be predicted theoretically and, assuming that these
two processes are statistically independent, it has been suggested
that the ordinates of the physical process could be determined by
simply subtracting the statistics-related-failure process ordinates
from the BTC ordinates. Are there guidelines for doing that? It
has been shown also that the ordinates of the above statistical
process depend on the probability density distribution function
for the “instantaneous” random failure rate. The examples were
carried out for the normal and Rayleigh distributions. But is this
a legitimate approach? And if it is, how should it be implemented
into (reduced to) the engineering practice?
• It has been recently predicted [26-34] that there is a possibility
in many cases to avoid inelastic strains in solder joint interconnections,
which are the most vulnerable structural elements in today’s
EOPDs technologies. Significant stress relief, even to an extent
that no inelastic strains could possibly occur, can be achieved by
considering joints with elevated stand-offs, such as column-gridarrays,
and/or by employing inhomogeneous bonds, when low
modulus solders or even epoxies are used at the assembly ends,
and/or by using low expansion (such as, e.g., ceramic or silicon)
substrates. Should one try first to design an inelastic-strain-free
assembly before trying to predict its lifetime assuming, in accordance
with the today’s practice, that the peripheral joints always
experience inelastic deformations and that the length of the expected
size of inelastic strain peripheral areas of the bond could
be predicted in advance?
• Real time degradation of electronic materials is a slow process.
Could physically meaningful and cost-effective methodologies for
measuring and predicting the degradation (aging) rates and consequences
be developed, at least for the most important or the most
typical devices? Could the appropriately modified BAZ model be
employed for doing that?
• An attempt has been recently made [35] to show how, provided
that the physics of failure is reasonably well understood, the total
cost of reliability could be minimized by quantifying the best
compromise between the initial cost of the product and cost of
its restoration during operation, if failure occurs. It has been
shown particularly that such an optimization is closely connected
with the optimization of the operational availability of the product.
Is this a promising approach?
• Could the approach aimed at the evaluation of the maximum
acceptable restoration time [36, 37] be helpful, when developing
an effective cost-optimization model?
• Predictive modeling (PM), and especially analytical (mathematical)
modeling [38-43], has proven to be a highly useful and highly
cost-effective means for understanding the physics of failure and
designing the most practical ATs in EOPDs engineering for a variety
of applications. Which models have been and might be the most
needed and most practical for future applications in aerospace engineering?
Is numerical, such as, say, FEA, modeling (simulation)
sufficient? Should physically meaningful and easy-to-use analytical
modeling be employed in addition to or even instead of numerical
modeling?
• It is widely recognized that it is absolutely critical to understand
the physics of failure to be able to design and operate a reliable
device. It goes without saying that such an understanding should
be based on a physically meaningful failure oriented accelerated
test (FOAT) model [44-48]. Which model can be used for this
purpose? Will a FOAT methodology using BAZ model [49-53]
do the job?
In the analysis that follows some of the above problems are addressed.
The emphasis is on the opportunities associated with the
use of the recently suggested novel, flexible and fruitful PDfR
concept [10] in electronics reliability, including the aerospace field.
The concept enables making a viable EOPDs product into a reliable
product with the predicted probability of non-failure in the field:when reliability is imperative, ability to quantify it is a must. It
is shown that the recently suggested BAZ model [49-53], and
particularly its multi-parametric modification can be successfully
employed for this purpose. It is shown also how this physics-offailure
based probabilistic model can be sandwiched between two
statistical models - Bayes theorem based and beta-distribution
based, when there is a need to diagnose the detected malfunction
of the device and update its reliability, if failures still occur
despite the predicted low probability of its occurrence [52]. The
substance of the BAZ model is briefly addressed in Appendix A.
PDfR Concept and Its Applications
The PDfR concept is based upon 1) FOAT, aimed at understanding
the physics of the anticipated or the observed failures and
at quantifying, on the probabilistic basis, the outcome of FOAT
conducted for the most vulnerable element(s) of the product of
interest for its most likely applications and the most meaningful
combination of possible stressors (stimuli); 2) simple, easy-to-use
and physically meaningful predictive modeling, both analytical
and computer-aided; and, if needed, subsequent; 3) sensitivity
analyses (SA) using the methodologies and algorithms developed
as by-products at the two previous steps. The PDfR concept proceeds
from the recognition that nothing is perfect and that the
difference between a highly reliable and an insufficiently reliable
product is “merely” in the level of the never-zero probability of
its failure. If this probability, predicted at the design stage for the
anticipated loading conditions and the given time in operation, is
not acceptable, then SA can be effectively employed to determine
what could/should be changed to improve the situation. No extra
FOAT effort will be required. The PDfR analysis enables one also
to check if the product of interest is not over-engineered, i.e.,
is not superfluously robust. If it is, it might be more costly than
necessary. The operational reliability cannot be low, but does not
have to be higher than necessary either, but has to be adequate
for the given product and application. PDfR concept is central
to calculate the probability of failure and/or the remaining useful
life (RUL) for an electronic material or a product and to use this
probability (and/or the probabilistic safety factor, defined as the
ratio of the mean value of the safety margin to its standard deviation)
as a suitable and physically most meaningful criterion of the
product’s performance. Although several advanced PDfR predictive
modeling techniques have been recently developed, mostly
for aerospace applications, the analysis in this paper uses more
or less elementary analytical probabilistic models. We elaborate
on the role and attributes of the recently suggested powerful and
flexible BAZ model [49-53] and particularly its multi-parametric
extension. This model can be successfully employed to predict,
quantify and assure operational reliability, as well as to analyze
and design electronic products with the predicted, quantified, assured,
and, if appropriate and cost-effective, even maintained and
specified probability of the operational (field) failure. It has been
shown [51] that the BAZ equation can be obtained as the final
steady-state part of the Markovian process of failure events and
that this part is the most conservative. In other words, one does
not have to address the transitional Markovian process in practical
engineering applications. The model can be used in the technical
diagnostics effort [54].
The following ten major (governing) principles (“commandments”)
reflect the rationale behind the PDfR concept:
1) PDfR concept is an effective means for improving the state-ofthe-
art in the field of the microelectronic reliability engineering by
quantifying, on the probabilistic basis, the operational reliability
of the product;
2) The probability of failure of an electronic product is never
zero, but could and should be assessed (quantified) and brought
to an acceptable (adequate) level;
3) The best electronic product should be considered, designed
and fabricated as the best compromise between the needs for its
reliability, cost effectiveness and time-to-market;
4) Electronic product’s reliability cannot be low, need not be higher
than necessary, but has to be adequate for the given application,
considering the projected lifetime, environmental conditions and
consequences of failure;
5) Redundancy, trouble-shooting and maintenance are important
factors to be considered, when adequate reliability level has to be
maintained, especially if the “genetic health” of the product is not
high, even when the appropriate burn-in procedure is carried out;
6) When reliability for whatever application is imperative, the ability
to quantify it is a must, especially if one intends to optimize
and to assure reliability;
7) One cannot design a product with quantified, optimized and
assured reliability by limiting the effort to the widely used today
highly accelerated life testing (HALT): a cost-effective and highlyfocused
FOAT is always a must;
8) Reliability is conceived at the design stage and should be taken
care of, first of all, at this stage. It is at the design stage, when an
attempt should be made to create a “genetically healthy” product
9) Highly cost-effective and highly focused FOAT geared to a
limited number of pre-determined simple, easy-to-use and physically
meaningful predictive reliability models and aimed at understanding
the physics of failure that is anticipated and quantified by
these models is an important constituent part of the PDfR effort;
10) PM, not necessarily using the well known FOAT models, is
another important constituent of the PDfR approach. PM, in
combination with well-established FOAT models (Arrhenius, Eyring,
etc.), is a powerful means to carry out, if necessary, SA, with
an objective to quantify and practically nearly eliminate failures
(“the principle of practical confidence”).
The next generation of ED QT could be viewed as a “quasi-
FOAT,” “mini-FOAT”, a sort-of an “initial stage of FOAT” that
more or less adequately replicates the initial non-destructive, yet
full-scale, stage of FOAT. The duration and conditions of such a
“mini-FOAT” QT could and should be established based on the
observed and recorded results of the actual FOAT, and should
be limited to the stage when no failures, or a predetermined and
acceptable small number of failures in the actual full-scale FOAT,
were observed. PHM technologies (“canaries”) could and should
be concurrently tested to make sure that the safe limit is established
correctly and is not exceeded. Such an approach to qualify
devices into products will enable the industry to specify, and the
manufacturers, including biomedical field, to assure, a predicted
and adequate probability of non-failure (safety factor) for a product
that passed the QT and is expected to be operated in the field
under the given conditions for the given time. FOAT should be
thoroughly designed, implemented, and analyzed, so that the QT is based on the trustworthy experimental data. Since FOAT cannot
do without simple, easy-to-use and physically meaningful PM,
the role of such modeling, both computer-aided and analytical
(mathematical), in making the suggested new approach to QT
practical and successful. It is imperative that the reliability physics
that underlies the mechanisms and modes of failure is well understood.
Such an understanding can be achieved only provided that
flexible, powerful and effective PDfR efforts are implemented.
When encountering a particular reliability problem at the design,
fabrication, testing, or an operation stage of a product’s life, and
considering the use of predictive modeling to assess the seriousness
and the likely consequences of the a detected failure, one has
to choose whether a statistical, or a physics-of-failure-based, or a
suitable combination of these two major modeling tools should
be employed to address the problem of interest and to decide on
how to proceed. A three-step concept (TSC) [52] is suggested
as a possible way to go in such a situation. The classical statistical
Bayes formula can be used at the first step in this concept
as a technical diagnostics tool. Its objective is to identify, on the
probabilistic basis, the faulty (malfunctioning) device(s) from the
obtained signals (“symptoms of faults”). The physics-of-failurebased
BAZ model and particularly its multi-parametric extension
can be employed at the second step to assess the RUL of the
faulty device(s). If the RUL is still long enough, no action might
be needed; if it is not, corrective restoration action becomes necessary.
In any event, after the first two steps are carried out, the
device is put back into operation (testing), provided that the assessed
probability of its continuing failure-free operation is found
to be satisfactory. If an operational failure nonetheless occurs, the
third step should be undertaken to update reliability. Statistical
beta-distribution, in which the probability of failure is treated as a
random variable, is suggested to be used at this step. While various
statistical methods and approaches, including Bayes formula and
beta-distribution, are well known and widely used in numerous
applications for many decades, the BAZ model was introduced
in the microelectronics reliability area only several years ago. Its
attributes and use are addressed and discussed therefore in some
detail. The suggested concept is illustrated by a numerical example
geared to the use of the highly popular today prognostics-andhealth-
monitoring (PHM) effort in actual operation, such as, e.g.,
en-route flight mission.
Let, e.g., the following input FOAT based information is obtained:
1) After t1 = 35h of testing at the temperature T1 = 60° C
= 333° K, the voltage V=600V and the relative humidity H=0.85,
10% of the tested modules exceeded the allowable (critical) level
of the leakage current of I* = 3.5μA and, hence, failed, so that the
probability of non-failure is P1 = 0.9; 2) After t2 = 70h of testing
at the temperature T2 = 85° C = 358° K at the same voltage and
the same relative humidity, 20% of the tested samples reached
or exceeded the critical level of the leakage current and, hence,
failed, so that the probability of non-failure is P2 = 0.8. Then the
equation (A-4) results in the following equation for the leakage
current sensitivity factor γI:
This equation has the solution 4893.2 1( ) 1 I γ = h− μ A − Thus,
1
* 17126.2 Iγ I = h− . A more accurate solution can be always obtained
by using Newton iterative method for solving transcendental
equations. This concludes the first step of testing. At the
second step, tests at two relative humidity levels H1 and H2, were
conducted for the same temperature and voltage levels. This leads
to the relationship:
Let, e.g., after t1 = 40h of testing at the relative humidity of H1
= 0.5 at the given voltage (say, V=600V) and temperature (say, T2
= 85° C = 358° K), 5% of the tested modules failed, so that P1 =
0.95, and after t2 = 55h, of testing at the same temperature and
at the relative humidity of H2 = 0.85, 10% of the tested modules
failed, so that P2 = 0.9. Then the above equation for the γH value,
with the Boltzmann constant k = 8.61733x10−5eV / K, yields:
0.03292 H γ = eV . At the third step, FOAT at two different voltage
levels V1 = 600V adn V2 = 1000V At the third step, FOAT
at two different voltage levels, T = 850C = 3580K and h = 0.85,
and it has been determined that 10% of the tested devices failed
after t1 = 40h of testing (P1 = 0.9) and 20% of devices failed after
t2 = 80h of testing (P2 = 0.8). The v factor
After the sensitivity factors of the leakage current, the humidity
and the voltage are found, the stress free activation energy U0 can
be determined on the basis of the last equation in Appendix A
for the given temperature and for any combination of loadings
(stimuli):
The third term in this result (the last term in the last equation
in Appendix A) plays the dominant role, so that, in approximate
evaluations, only this term could be considered. Calculations indicate
that the loading free activation energy in the above numerical
example (even with the rather tentative, but still realistic, input
data) is about 0 U = 0.5eV. This result is consistent with the existing
experimental data. Indeed, for semiconductor device failure
mechanisms the activation energy ranges from 0.3eV to 0.6eV,
for metallization defects and electro-migration in Al it is about
0.5eV, for charge loss it is on the order of 0.6eV, for Si junction
defects it is 0.8eV. The distribution (A-4) yields:
The objective of the numerical example below is to illustrate how
the suggested TSC can be used to assess and to maintain high
probability of non-failure in actual MED operation conditions.
Step 1. Application of Bayes formula as suitable technical diagnostic tool
The application of the Bayes formula enables one to assess the
reliability of a particular malfunctioning device from the available
general information for similar devices. It has been established,
e.g., from experience with the given type of devices subjected in
actual operation conditions to elevated temperature and vibrations
that 90% of the devices do not typically fail during operation.
It has been established also that the diagnostic symptom -
an increase in temperature by 20ºC above the normal (specified)
level - is encountered in 5% of the devices. The PHM technical
diagnostics instrumentation has detected in a particular device the
following two deviations (“symptoms of failure”) from normal
operation conditions: 1) increase in temperature by 20ºC at the
heat sink location (symptom S1) and 2) increase in the vibration
power spectrum by (symptom S2). These symptoms might be due
to the malfunction of the heat sink (state D1) and/or of the vibration
protection equipment (state D2). From the previous experience
with similar device at similar operation conditions it has
been established that the symptom S1 (increase in temperature) is
not observed at normal operation condition (state D3), and the
symptom S2 (increase in the power of the vibration spectrum) is
observed in of the cases (devices). It has been established also,
based on the accumulated experience with this type of devices,
that of them do not fail during the specified time of operation,
of the devices experience the malfunction of the heat sink (state
D1), and 15% of the devices are characterized by the state D2 (malfunction
of the vibration protection equipment). Finally, it has
been established that the symptom (increase in temperature) is encountered
in the state D1(because of the malfunction of the heat
sink) in of the devices, and in the state D2 (because of the malfunction
of the vibration protection system) - in of the devices;
and that the symptom S2 (increase in the power of the vibration
spectrum) is encountered in the state D1 (malfunction of the heat
sink) in of the devices and in state (malfunction of the vibration
protection system) - in of the devices. The above information can
be summarized in the form of the diagnostics matrix shown in
Table 2. Thus, this matrix indicates that 1) the symptom (increase
in temperature) is encountered in 20% of the cases because of the
malfunctioning heat sink (state D1), in 40% of the cases because
of the malfunctioning vibration protection system (state D2), and
is never observed in normal operation conditions (state D3); 2)
the symptom S2 (increase in the power of the vibration spectrum)
is encountered in 30% of the cases because of the malfunctioning
heat sink (state D1), in 50% of the cases because of the malfunctioning
vibration protection system (state D2), and in 5% of
the cases in normal operation conditions (state D3); and 3) the
symptom S3 (both heat transfer and vibration protection hardware
work normally) is encountered in 5% of the cases because of the
malfunctioning heat sink (state D1), in 15% of the cases because
of the malfunctioning vibration protection system (state D2), and
in 80% of the cases in normal operation conditions (state D3).
Let us determine first that the probability that the device, in which
the 20ºC increase in temperature has been detected, is still sound.
This can be done using the information that 90% of the devices
of the type of interest do not typically fail during the designated
time of operation and that the symptom S1, which is an increase
in temperature by 20ºC above the normal level, is encountered in
5% of these devices. The first message tells that the probabilities
of the sound condition D1 and the faulty condition D2 in the general
population of the devices under operation are P (D1) = 0.9
and P (D2) = 0.1, respectively. The second message tells that the
conditional probabilities reflecting the actual situation with the
given device are P (S/D1) = 0.05 and P (S/D2) = 0.95: only 5%
of the devices function adequately, and 95% of them do not. The
question asked is as follows: with this new information about a
particular device, how did the expected probability P (D1) = 0.9
that the device of interest is still sound has changed? In other
words, how could one use the accumulated experience about the
operational performance of the large population of this type of
devices, considering the results of the actual field information for
a particular device?
The Bayes formula yields:
Thus, the probability that the device is still sound has decreased
dramatically, from for the typical (expected) situation to as low as
because of the detected 20ºC increase in the observed temperature
and because such an increase is viewed as a failure of the
device. The Bayes formula indicates, particularly, that the factor χ
defined by the formula (A-5) accounts for the change, based on
the updated reliability information, in the initial probability that
the device is still sound and, hence, its use could be continued
with a high level of confidence. The decrease in the probability
of non-failure would be much different, if only a slight decrease
in the probability of non-failure for the given device, based on
the obtained symptom, is detected. Indeed, with P(S/D1) = 0.85
(instead of 0.05) and P(S/D2) = 0.15 (instead of 0.95), the factor
χ would be as high as χ = 0.9808, and the updated probability
of non-failure would be also high: P(D1/S) = 0.8827 Let us address
now, using the information provided by the Table 1, the
performance of a device because of the possible malfunction of
the heat sink and/or the vibration protection system. The probabilities
of the device states, when both symptoms, S1 (faulty heat
sink) and S2 (inadequate vibration protection), have been detected,
can be found using Bayes formula as follows:
This is the probability that the device, for which both symptoms,
malfunctioning heat sink and malfunctioning vibration protection
system, have been detected, is in the state D1, i.e., failed because of the malfunctioning heat sink. Similarly, one could find the probability
P (D2/S1S2) = 0.91 that the device is in the state D2, i. e.,
failed because of the malfunctioning vibration protection system.
Since the device has failed, it cannot be in the non-failure state
D3, and therefore the probability that the device is still sound, despite
the detected malfunctions of the heat sink and the vibration
protection system, is zero: P(D3/S1S2) = 0. Let us determine the
probability of the device’s state if the PHM measurements have
indicated that there was no increase in temperature (the symptom
S1 did not take place), but the symptom S2 (increase in the power
spectrum of the induced vibrations) was detected. The absence
of the symptom S1 means that the symptom 1 S of the opposite
event took place, so that P(S1 / Di ) =1− P(S1 / Di ) . Changing the probability
P(S1/Di) Changing the probability P (S1/Di) in the above
diagnostics matrix for 1 ( / )i P S D we find the following probability
of the state D1 of the device (the device failed because of the
malfunctioned heat sink):
Similarly, we obtain: 2 1 2 P(D / S S ) = 0.46; 3 1 2 P(D / S S ) = 0.41. Determine
now the probabilities of the device states when none of the
symptoms took place. We find:
Similarly, we have: 2 1 2 3 1 2 P(D / S S ) = 0.05;P(D / S S ) = 0.92. Thus, when
both symptoms, S1 and S2 are are observed, the state D1 (failure
occurred because the heat sink is malfunctioning) has the probability
of occurrence of 0.91. When none of these symptoms are
observed, the normal state, D3, is characterized by the probability
and, hence, is somewhat more likely to occur than the state, when
both symptoms, S1 and S2, are observed. When the symptom S1
(elevated temperature) is not observed, while the symptom S2
(elevated vibrations) is, the probabilities of the states S2 (vibration
protection system is not working properly) and S3 (both heat
transfer and vibration protection hardware work normally) are
0.46 and 0.41 respectively. One could either accept this information
and act accordingly, i.e., go ahead with a conclusion that it
is the elevated temperature and not the elevated vibration that
should be taken care of, or, since these probabilities are close, one
might decide on seeking additional information. Such an information
could be based on generated additional observations and/
or should use other sources to obtain more accurate and more
convincing diagnostics information (e.g., modeling or additional
measurements). Thus, the first step of the TSC enables one to
identify, on the probabilistic basis, the malfunctioning device(s)
and the most likely cause(s) that have resulted in the device failure.
The objective of the next step is to assess, using BAZ equation,
the RUL of the detected the malfunctioning device(s).
Strep 2: Application of BAZ Equation to Predict the RUL and the Corresponding Probability
Assume that FOAT has been conducted at the design stage with
an objective of determining the process parameters anticipated
by the BAZ model, and that the first stage tests have been carried
out at two temperature levels T1 and T2, with the temperature ratio
of T1/T2 = 0.95 and the recorded time-to-failure ratio t1/t2 = 1.5,
until, say, half of the population of the devices failed: Q1 = Q2 = 0.5. Then the equation (A-5) results in the following equation for
the sought dimensionless time τ1 = τ0/t1:
This equation has the following solution:
obtained by the trial-and-error (interpolation) technique. If Newton’s
formula is used, by putting, e.g., 4
0 τ =10− as the initial
(zero) approximation and using the well-known Newton’s recurrent
formula to compute higher approximations, we obtain:
The latter result agrees well with the result obtained using trialand-
error technique. Let the FOAT has been conducted at the
temperature of T = 450° K at two stress levels with the stress ratio
of, say, 2 1 1.2 σ σ = . Testing is run until half of the population failed
Q1= Q2 = 0.5, and the recorded time ratio, when failures occurred,
has been t1/t2 = 1.5. In this example it is assumed that the time
constant τ0 in the BAZ equation is known from the previous
FOAT. With this constant known, we calculate the τ0/t1 ratio for
the new time t1. Let this ratio be, say, 0 4 1 4.0x10 t τ − = Then the loading σ1 related energy is as follows:
and the temperature T related energy kt is
The ratio of the loading related energy to the temperature-related
energy is therefore
This ratio will be larger for larger loadings and lower temperatures.
The ratio of the stress-free activation energy to the thermal
energy can be determined as
Hence, the stress-free activation energy is
Let us assume that the FOAT-based and BAZ-based calculations
carried out at the operation temperature of T = 90ºC = 363ºK
have indicated that the time factor is τ0 = 10-4 sec; the ratio of the
stress-free activation energy to the temperature-related energy is U0 30.0;
kT = and the ratio of the stress-related energy to the thermal energy is 1.0. kT γσ = Then the BAZ formula (A-1) results in the following projected lifetime:
This time decreases to
for the 20% increase in the power of the vibration spectrum and
is only
in the case of the 20ºC increase in temperature. Thus, the increase
in temperature should be in this example of a greater concern
than the increase in the vibration response (in the output vibration
spectrum). Also, based on the Bayes formula prediction, the
malfunction of the device due to the increased temperature is
more likely than because of the faulty vibration protection system.
Thus, the output of this TSC stage is the assessed, on the probabilistic
basis, the RUL of the device(s). As has been indicated in
the abstract, if the assessed RUL time is still long enough, no
action might be needed, if not -corrective restoration action becomes
necessary. In any event, after the first two TSC steps have
been carried out, the devices are put back into operation, provided
that the assessed probability of their continuing failure-free
operation is found to be satisfactory. If failure nonetheless occurs,
the third step should be undertaken to update the predicted
reliability. Statistical beta-distribution, in which the probability of
failure is treated as a random variable, is suggested to be used at
the third step.
Step 3. Application of Beta-distribution (BD) to Update Reliability Information
Let the performance of five “suspicious” (malfunctioning) devices
is monitored, and one of them failed. Let us determine the
beta-distribution characteristics for four successes and one failure
(α = 4,β =1) We have: α =α +1 = 5,β = β +1 = 2, and the characteristics of the beta-distribution are:
With α β (there are more successes than failures), the distribution
skews to the higher probabilities of non-failure, and the
mode (the maximum value, of the probability density function) is
higher than the mean value and the median. Let no failures have
been observed after the first two TSC steps have been carried out.
Let us determine the expected number of successes (non-failures)
as a function of the probability of non-failure. Assuming zero failures (β = 0,β =1) we have: 2 1 1 p p α − = − If the mean value of the probability of non-failure is p = 0.7143, then α =1.5002.
Since α value has to be expressed by an integer, one should assume
either α =1 (α = 2) , or α = 2 (α = 3) . Then we obtain the
following characteristics of the distribution:
when α = 3,β =1. In both cases, it is a triangular distribution: the
mode remains the same, and is at p=1. The mean and the median
increase in the caseα = 3,β =1 in comparison with the case
α = 2,β =1 because of a larger number of successes. The variance
reduces, because of the improved information, and the skewness
(shift in the direction of higher probabilities of non-failure) and
the kurtosis (“peakedness”) of the distribution increase. Assume
now that the predicted (anticipated) probability of non-failure is
as high as p = 0.95. Despite such a high probability of nonfailure,
the product exhibited nonetheless a field failure. Let us
determine, based on this additional information, the revised (updated)
estimate of the actual operational probability of non-failure.
Assuming that the anticipated (projected) number of failures
was zero (β = 0,β =1)
prior to putting the device(s) into operation,
and using the formula for the numberα of anticipated nonfailures
from the previous example, we obtain, with p = 0.95,
that 2 1 0.90 18.
1 0.05
p
p
α
−
= = =
−
For the new posterior failure, with
α =18(α =19) and β =1
(β = 2) , the characteristics of the probability
of non-failure are
Thus, because of the occurrence of the unexpected failure, the
actual probability of non-failure of the product is only 90.45%,
and not 95%. Note that this result, obtained assuming a 95% nonfailure
level, indicates that after the first failure has occurred, as
many as nineteen additional continuous non-failures (successes),
i.e., 18+19=37 successes and 1 failure, would have to be recorded
(observed) in order to return the device’s dependability (probability
of non-failure) to its original specified estimate of 95%. We
addressed above a situation where one failure has occurred. Let us
examine a situation with two failures. In this case one should put
α =18 (α =19) and β = 2 (β = 3) and the characteristics of the BD for the probability of non-failure become as follows:
Thus, the operational probability of non-failure reduced by about
9.12%, compared to the projected probability of 95%, and by an
additional 4.52% with respect to the situation with a single failure.
The mean, the median and the mode have also reduced, and because
of the higher number of failures, the variance has increased,
and the skewness and the kurtosis have decreased.
Acronyms
ALT=Accelerated Life Testing
AT=Accelerated testing
BAZ=Boltzmann-Arrhenius-Zhurkov (model)
BD=Beta Distribution
DfR=Design for Reliability
FEA=Finite Element Analysis
FOAT=Failure Oriented Accelerated Testing
HALT=Highly Accelerated Life Testing
QT=Qualification Testing
MED=Medical Electronic Device
MTTF=Mean Time to Failure
PDfR=Probabilistic Design for Reliability
PHM=Prognostics and Health Management
PM=Predictive Modeling
PPM=Probabilistic Predictive Modeling
RUL=Remaining Useful Lifetime
SA=Sensitivity Analysis
SoF=Symptoms of Faults
TSC=Three-Step Concept
Conclusion
The application of the PDfR concept, FOAT and the multi-parametric
BAZ model enables improving dramatically the state of
the art in the field of the aerospace ED reliability prediction and
assurance.
References
- Suhir E, Mahajan R. Are Current Qualification Practices Adequate?. Circuit Assembly, 2011.
- Suhir E. Assuring Aerospace Electronics and Photonics Reliability: What Could and Should Be Done Differently. IEEE Aerospace Conference, Big Sky, Montana, March, 2013.
- Suhir E. What could and should be done differently: failure-oriented-accelerated- testing (FOAT) and its role in making an aerospace electronics device into a product. Journal of Materials Science: Materials in Electronics. 2018 Feb 1; 29(4): 2939-48.
- Suhir E. Could electronics reliability be predicted, quantified and assured?. Microelectronics Reliability. 2013 Jul 1; 53(7): 925-36.
- Suhir E, Bensoussan A. Quantified reliability of aerospace optoelectronics. Int. J. Aerosp. 2014; 7(1).
- Suhir E, Rafanelli AJ. Applied probability for engineers and scientists. McGraw-Hill, New York, 1997.
- Suhir E. Thermal stress modeling in microelectronics and photonic structures and the application of the probablistic approach: Review and extension. The International journal of microcircuits and electronic packaging. 2000; 23(2):215-23.
- Suhir E. Probabilistic Design for Reliability. Chip Scale Reviews. 2010; 14(6).
- Suhir E, Nicolics J, Yi S. Probabilistic predictive modeling (PPM) of aerospace electronics (AE) reliability: prognostic-and-health-monitoring (PHM) effort using Bayes formula (BF), Boltzmann-Arrhenius-Zhurkov (BAZ) equation and beta-distribution (BD). InBoltzmann-Arrhenius-Zhurkov (BAZ) equation and betadistribution (BD), EuroSimE Conf, Montpelier, France 2016.
- Suhir E. Probabilistic Design for Reliability of Electronic Materials, Assemblies, Packages and Systems: Attributes, Challenges, Pitfalls. InMMCTSE 2017, Cambridge, UK. 2017.
- Suhir E. Aerospace electronics reliability prediction: application of two advanced probabilistic techniques. ZAMM‐Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik. 2018 May; 98(5): 824-39.
- Suhir E, R Ghaffarian. Constitutive Equation for the Prediction of an Aerospace Electron Device Performance-Brief Review. Aerospace. 2018; 74(4).
- Suhir E. Considering electronic product’s quality specifications by application (s). Chip Scale Reviews. 2012 Jul; 16(4).
- . Khatibi G, Czerny B, Magnien J, Lederer M, Suhir E, Nicolics J. Towards adequate qualification testing of electronic products: Review and extension. In2014 IEEE 16th Electronics Packaging Technology Conference (EPTC). 2014 Dec 3; 186-191.
- Suhir E. Accelerated life testing (ALT) in microelectronics and photonics: its role, attributes, challenges, pitfalls, and interaction with qualification tests. J. Electron. Packag. 2002 Sep 1; 124(3): 281-91.
- Suhir E. Reliability and Accelerated Life Testing. Semiconductor International, 2005.
- Suhir E, R Ghaffarian. Electron Device Subjected to Temperature Cycling: Predicted Time-to-Failure. Journal of Electronic Materials. 2019; 48(2): 778-9.
- Suhir E. Analysis of a pre‐stressed bi‐material accelerated‐life‐test (ALT) specimen. ZAMM‐Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik. 2011 May 3; 91(5): 371-85.
- Suhir E, Nicolics J. Analysis of a Bow-Free Prestressed Test Specimen. Journal of Applied Mechanics. 2014 Nov 1; 81(11).
- Suhir E. To Burn-In, or Not to Burn-In: That’s the Question. Aerospace. 2019 Mar; 6(3): 29.
- Suhir E. Burn-in: When, For How Long and at What Level?. Chip Scale Reviews. 2019.
- Suhir E. Is Burn-in Always Needed?. Int. J. of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2020; 9(1).
- Suhir E. For How Long Should Burn-in Testing Last?. Journal of Electrical & Electronic Systems (JEES). 2019; 8(2).
- Suhir E. Statistics-related and reliability-physics-related failure processes in electronics devices and products. Modern Physics Letters B. 2014 May 30; 28(13): 1450105.
- Suhir E, Bensoussan A. Degradation related failure rate determined from the experimental bathtub curve. InSAE Conference, Seattle, WA. 2015; 2224.
- Suhir E, Ghaffarian R, Nicolics J. Could application of column-grid-array (CGA) technology result in inelastic-strain-free state-of-stress in solder material?. Journal of Materials Science: Materials in Electronics. 2015 Dec 1; 26(12): 10062-7.
- Suhir E. Analysis of a short beam with application to solder joints: could larger stand-off heights relieve stress?. The European Physical Journal Applied Physics. 2015 Aug 1; 71(3): 31301.
- Suhir E, Ghaffarian R. Predicted stresses in a ball-grid-array (BGA)/columngrid- array (CGA) assembly with a low modulus solder at its ends. Journal of Materials Science: Materials in Electronics. 2015 Dec 1; 26(12): 9680-8.
- Suhir E, Ghaffarian R. Predicted stresses in a ball-grid-array (BGA)/columngrid- array (CGA) assembly with an epoxy adhesive at its ends. Journal of Materials Science: Materials in Electronics. 2016 May 1; 27(5): 4399-409.
- Suhir E. Avoiding low-cycle fatigue in solder material using inhomogeneous column-grid-array (CGA) design. ChipScale Rev. 2016.
- Suhir E. Bi-material assembly with a low-modulus-and/or-low-fabricationtemperature bonding material at its ends: optimized stress relief. Journal of Materials Science: Materials in Electronics. 2016 May 1; 27(5):4816-25.
- Suhir E. Expected stress relief in a bi-material inhomogeneously bonded assembly with a low-modulus-and/or-low-fabrication-temperature bonding material at the ends. Journal of Materials Science: Materials in Electronics. 2016 Jun 1; 27(6): 5563-74.
- Suhir E, Yi S, Ghaffarian R. How Many Peripheral Solder Joints in a Surface Mounted Design Experience Inelastic Strains?. Journal of Electronic Materials. 2017 Mar 1; 46(3):1747-53.
- Suhir E, Ghaffarian R, Yi S. Solder material experiencing low temperature inelastic stress and random vibration loading: predicted remaining useful lifetime. Journal of Materials Science: Materials in Electronics. 2017 Feb 1; 28(4):3585-97.
- Suhir E, Bechou L. Availability index and minimized reliability cost. Circuit Assemblies. 2013 Feb.
- Suhir E. How Long Could/Should be the Repair Time for High Availability?. Modern Physics Letters B (MPLB). 2013 Aug 30; 27(12).
- Suhir E, Salotti JM, Nicolics J. Required Repair Time to Assure the Given/ Specified Availability. Optics, Photonics & Sensors. 2020 Apr 18; 1.
- Suhir E. Analytical Modeling in Structural Analysis for Electronic Packaging: Its Merits, Shortcomings and Interaction with Experimental and Numerical Techniques. ASME Journal of Electronic Packaging. 1989 Jun; 111(2).
- Suhir E. Analytical stress-strain modeling in photonics engineering: its role, attributes and interaction with the finite-element method. Laser Focus World. 2002 May; 14: 611-5.
- Suhir E. Analytical thermal stress modeling in physical design for reliability of micro-and opto-electronic systems: role, attributes, challenges, results. InMicro-and Opto-Electronic Materials and Structures: Physics, Mechanics, Design, Reliability, Springer, Boston, MA. 2007; B3-B21.
- Suhir E. Analytical thermal stress modeling in electronics and photonics engineering: Application of the concept of interfacial compliance. Journal of Thermal Stresses. 2019 Jan 2; 42(1): 29-48.
- Suhir E. Application of Analytical Modeling in the Design for Reliability of Electronic Packages and Systems, Springer. 2019.
- Suhir E. Analytical modeling enables explanation of paradoxical behaviors of electronic and optical materials and assemblies. Advances in materials Research. 2017 Jun 1; 6(2):185.
- Suhir E. Failure-oriented-accelerated-testing (FOAT) and its role in making a viable IC package into a reliable product. Circuits Assembly, July. 2013.
- Suhir E. Failure-oriented-accelerated-testing (FOAT), boltzmann-arrheniuszhurkov equation (BAZ) and their application in microelectronics and photonics reliability engineering. Int J Aeronaut Sci Aerospace Res. 2019; 6(3): 185-91.
- Suhir E, Bensoussan A, Nicolics J, Bechou L. Highly accelerated life testing (HALT), failure oriented accelerated testing (FOAT), and their role in making a viable device into a reliable product. In2014 IEEE Aerospace Conference, Big Sky, Montana 2014.
- Suhir E, Bensoussan A. Application of multi-parametric BAZ model in aerospace optoelectronics. In2014 IEEE Aerospace Conference, Big Sky, Montana 2014.
- Suhir E. Failure-Oriented-Accelerated-Testing and its Possible Application in Ergonomics. International Journal. 2019; 3(2).
- Zhurkov SN. Kinetic concept of the strength of solids. International Journal of Fracture Mechanics. 1965 Dec; 1:311-23.
- Suhir E, Bechou L, Bensoussan A. Technical Diagnostics of Electronics Products: Application of Bayes Formula and Boltzmann-Arrhenius-Zhurkov (BAZ) Model Print E-mail. 2012.
- Suhir E, Kang SM. Boltzmann–Arrhenius–Zhurkov (BAZ) model in physics-of-materials problems. Modern Physics Letters B. 2013 May 30; 27(13):1330009.
- Suhir E. Three-step concept in modeling reliability: Boltzmann–Arrhenius– Zhurkov physics-of-failure-based equation sandwiched between two statistical models. Microelectronics Reliability. 2014 Oct: 2594-603.
- Suhir E. Boltzmann-Arrhenius-Zhurkov Equation and Its Applications In Electronic-and-Photonic Aerospace Materials Reliability-Physics Problems. Int. Journal of Aeronautical Science and Aerospace Research (IJASAR). 2020; 24.
- Suhir E, Bechou L, Bensoussan A. Technical Diagnostics in Electronics: Application of Bayes Formula and Boltzmann-Arrhenius-Zhurkov (BAZ) Model. 2012.