International Journal of Aeronautical Science & Aerospace Research (IJASAR) / IJASAR-2470-4415-07-401

Aerospace Electronics Reliability must be Quantified to be Assured: Application of the Probabilistic Design for Reliability Concept

E. Suhir*

Bell Laboratories, Murray Hill, NJ, USA (ret); Portland State University, Portland, OR, USA.
Technical University, Vienna, Austria; James Cook University, Queensland, Australia.
ERS Co., 727 Alvina Ct., Los Altos, CA 94024, USA.

*Corresponding Author

E. Suhir,
Bell Laboratories, Murray Hill, NJ, USA (ret); Portland State University, Portland, OR, USA,
Technical University, Vienna, Austria; James Cook University, Queensland, Australia; and
ERS Co., 727 Alvina Ct., Los Altos, CA 94024, USA
Tel: 650.969.1530
Email: suhire@aol.com/e.suhir@ieee.org

Received: September 25, 2020; Accepted: October 08, 2020; Published: November 30, 2020

Citation:E. Suhir. Aerospace Electronics Reliability must be Quantified to be Assured: Application of the Probabilistic Design for Reliability Concept. Int J Aeronautics Aerospace Res. 2020;7(3):235-243. doi: dx.doi.org/10.19070/2470-4415-2000029

Copyright: E. Suhir© 2020. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

The recently suggested probability design for reliability (PDfR) concept can be effectively used for making a viable electronic, optoelectronic, photonic, or MEMS device (EOPD) into a reliable product. Understanding the physics-of-failure is critical to the assurance of the EOPD reliability, and the PDfR concept has therefore its experimental basis in the highly focused and highly cost-effective failure oriented accelerated testing (FOAT) geared to a physically meaningful and trustworthy predictive model that can be used for the prediction of the remaining useful lifetime (RUL) in actual operation conditions from the FOAT data. The multi-parametric Boltzmann-Arrhenius-Zhurkov (BAZ) equation suggested about a decade ago could be employed in this capacity. We focus in this analysis on some major features of the PDfR and its possible interactions with statistical approaches, when BAZ model is sandwiched between two well known statistical approaches, Bayes formula and beta-distribution. It is concluded that the application of the PDfR concept, FOAT and the multi-parametric BAZ model enables improving dramatically the state-of-the-art in the field of the aerospace EOPD reliability prediction and assurance. The general concepts are illustrated by numerical examples.

Full Text -HTML

1.Keywords
2.Introduction
3.PDfR Concept and Its Applications
4.Numerical Examples
5.Acronyms
6.References

Keywords

Aerospace Electronics; Reliability; Probabilistic Modeling; Accelerated-Testing.

Introduction

Here are some current problems envisioned and questions asked in connection with assuring aerospace EOPDs reliability:

• EOPD products, which passed the existing qualification tests (QT), often exhibit nevertheless premature operational failures.
Are the existing QT standards, methodologies, procedures and practices adequate [1]?
• If they are not, what could and should be done differently in the next generation of QT [2]?
• While the failure of a commercial electronic product is usually not viewed as a catastrophe, as long as the percentage of failed devices is low and the product is still sellable, in aerospace EOPDs the consequences of device failure might be dramatic, sometime even more severe than in some other areas of EOPD engineering, such as military, long-haul communications, medical, etc. Should the likelihood of failure of theEOPDs, whose operational reliability is critical, be necessarily quantified to be assured [3-5]?
• And because nothing is perfect, and the difference between a highly reliable and insufficiently reliable EOPDs is “merely” in the level of the never-zero probability of failure, what probabilistic means should be employed to quantify and assure EOPDs reliability [6-11]?
• Every five-seven years or so a new generation of EOPDs is being developed. Old ED products become obsolete, although they are still physically reliable. Should aerospace EOPDs manufacturers, after the acceptable probability of the never-zero likelihood of failure is established and agreed upon, consider, at the design and production stages, relatively short, but realistic and predictable, lifetimes of their products [12]? As a friend of mine has put it, “I do not need an expensive everlasting pen, because I do not intend to live forever”.
• The reliability of an EOPDs product should be different for different products and applications [13]. Should this circumstance be considered when planning and evaluating the product’s lifetime and the adequate probability of failure?
• How to establish the appropriate list of the crucial (AT), the physically meaningful and particular-application oriented stressors (stimuli), and, having in mind that the principle of superposition is not applicable in reliability engineering, their relevant combinations [14]?
• And should these combinations necessarily reflect those that the device will encounter in actual operation [15, 16]?
• There is currently a lot of criticism about the adequacy of the widespread temperature cycling as the most preferable AT approach in EOPDs reliability engineering. Temperature cycling tests are not only costly, time consuming and require expensive and sophisticated equipment to be conducted, but, most importantly, their results might be misleading, since the temperature range in these ATs has to be much broader than what an EOPDs might encounter in actual operation conditions, and the EOPDs materials behavior is, as is known, very much temperature sensitive, and might be quite different therefore at very high and very low testing temperatures, than at moderate temperatures in actual operation conditions. Should the temperature cycling tests for EOPDs be replaced by, say, low-temperature/random-vibrations bias, or by another more physically meaningful and, perhaps, less expensive and less labor and time consuming tests [17]?
• Although mechanical pre-stressing of the accelerated life test (ALT) specimens [18, 19] could minimize the above shortcoming, such a pre-stressing, acceptable as a research effort, could hardly be recommended in actual industrial practice. Could, e.g., the indicated above combination of low temperature conditions (because the thermally induced stresses in an ED fabricated at an elevated temperature and subsequently cooled down to a low, room or testing, temperature are the highest at low temperature conditions and also because fatigue and brittle cracks propagate more rapidly at low temperature conditions) and random vibrations, be employed as a suitable combination of loadings that could be employed as an appropriate QT technique?
• Could such testing be employed also as a suitable burn-in test (BIT) that, in addition, would be able to weed out infant mortality failures [20-23]?
• The experimental bathtub curve (BTC) is the “reliability passport” of an aerospace EOPDs. It is well known that there are two major irreversible random processes that form such a curve for mass-produced devices: the statistics-related-failure process that results in the decreased failure rate with time (the BTC’s infant mortality portion reflects the statistical nature of such a process) and reliability-physics-related-failure process that results in the increased failure rate with time (the wear-out portion of the bathtub curve explicitly reflects the ultimate physics of this process). The decreased and the increased failure rates caused by these two processes result in a more or less constant failure rate at the steadystate portion of the BTC. If one sets out to improve the reliability and increase the RUL of the EOPDs, it is clear that he/she should focus on the physics-of-failure process, especially at the wear-out portion of the BTC. But how to separate this process from the statistical process? It has been shown [24, 25] that the statistical process can be predicted theoretically and, assuming that these two processes are statistically independent, it has been suggested that the ordinates of the physical process could be determined by simply subtracting the statistics-related-failure process ordinates from the BTC ordinates. Are there guidelines for doing that? It has been shown also that the ordinates of the above statistical process depend on the probability density distribution function for the “instantaneous” random failure rate. The examples were carried out for the normal and Rayleigh distributions. But is this a legitimate approach? And if it is, how should it be implemented into (reduced to) the engineering practice?
• It has been recently predicted [26-34] that there is a possibility in many cases to avoid inelastic strains in solder joint interconnections, which are the most vulnerable structural elements in today’s EOPDs technologies. Significant stress relief, even to an extent that no inelastic strains could possibly occur, can be achieved by considering joints with elevated stand-offs, such as column-gridarrays, and/or by employing inhomogeneous bonds, when low modulus solders or even epoxies are used at the assembly ends, and/or by using low expansion (such as, e.g., ceramic or silicon) substrates. Should one try first to design an inelastic-strain-free assembly before trying to predict its lifetime assuming, in accordance with the today’s practice, that the peripheral joints always experience inelastic deformations and that the length of the expected size of inelastic strain peripheral areas of the bond could be predicted in advance?
• Real time degradation of electronic materials is a slow process. Could physically meaningful and cost-effective methodologies for measuring and predicting the degradation (aging) rates and consequences be developed, at least for the most important or the most typical devices? Could the appropriately modified BAZ model be employed for doing that?
• An attempt has been recently made [35] to show how, provided that the physics of failure is reasonably well understood, the total cost of reliability could be minimized by quantifying the best compromise between the initial cost of the product and cost of its restoration during operation, if failure occurs. It has been shown particularly that such an optimization is closely connected with the optimization of the operational availability of the product. Is this a promising approach?
• Could the approach aimed at the evaluation of the maximum acceptable restoration time [36, 37] be helpful, when developing an effective cost-optimization model?
• Predictive modeling (PM), and especially analytical (mathematical) modeling [38-43], has proven to be a highly useful and highly cost-effective means for understanding the physics of failure and designing the most practical ATs in EOPDs engineering for a variety of applications. Which models have been and might be the most needed and most practical for future applications in aerospace engineering? Is numerical, such as, say, FEA, modeling (simulation) sufficient? Should physically meaningful and easy-to-use analytical modeling be employed in addition to or even instead of numerical modeling?
• It is widely recognized that it is absolutely critical to understand the physics of failure to be able to design and operate a reliable device. It goes without saying that such an understanding should be based on a physically meaningful failure oriented accelerated test (FOAT) model [44-48]. Which model can be used for this purpose? Will a FOAT methodology using BAZ model [49-53] do the job?

In the analysis that follows some of the above problems are addressed. The emphasis is on the opportunities associated with the use of the recently suggested novel, flexible and fruitful PDfR concept [10] in electronics reliability, including the aerospace field. The concept enables making a viable EOPDs product into a reliable product with the predicted probability of non-failure in the field:when reliability is imperative, ability to quantify it is a must. It is shown that the recently suggested BAZ model [49-53], and particularly its multi-parametric modification can be successfully employed for this purpose. It is shown also how this physics-offailure based probabilistic model can be sandwiched between two statistical models - Bayes theorem based and beta-distribution based, when there is a need to diagnose the detected malfunction of the device and update its reliability, if failures still occur despite the predicted low probability of its occurrence [52]. The substance of the BAZ model is briefly addressed in Appendix A.

PDfR Concept and Its Applications

Underlying Physics of Failure

The PDfR concept is based upon 1) FOAT, aimed at understanding the physics of the anticipated or the observed failures and at quantifying, on the probabilistic basis, the outcome of FOAT conducted for the most vulnerable element(s) of the product of interest for its most likely applications and the most meaningful combination of possible stressors (stimuli); 2) simple, easy-to-use and physically meaningful predictive modeling, both analytical and computer-aided; and, if needed, subsequent; 3) sensitivity analyses (SA) using the methodologies and algorithms developed as by-products at the two previous steps. The PDfR concept proceeds from the recognition that nothing is perfect and that the difference between a highly reliable and an insufficiently reliable product is “merely” in the level of the never-zero probability of its failure. If this probability, predicted at the design stage for the anticipated loading conditions and the given time in operation, is not acceptable, then SA can be effectively employed to determine what could/should be changed to improve the situation. No extra FOAT effort will be required. The PDfR analysis enables one also to check if the product of interest is not over-engineered, i.e., is not superfluously robust. If it is, it might be more costly than necessary. The operational reliability cannot be low, but does not have to be higher than necessary either, but has to be adequate for the given product and application. PDfR concept is central to calculate the probability of failure and/or the remaining useful life (RUL) for an electronic material or a product and to use this probability (and/or the probabilistic safety factor, defined as the ratio of the mean value of the safety margin to its standard deviation) as a suitable and physically most meaningful criterion of the product’s performance. Although several advanced PDfR predictive modeling techniques have been recently developed, mostly for aerospace applications, the analysis in this paper uses more or less elementary analytical probabilistic models. We elaborate on the role and attributes of the recently suggested powerful and flexible BAZ model [49-53] and particularly its multi-parametric extension. This model can be successfully employed to predict, quantify and assure operational reliability, as well as to analyze and design electronic products with the predicted, quantified, assured, and, if appropriate and cost-effective, even maintained and specified probability of the operational (field) failure. It has been shown [51] that the BAZ equation can be obtained as the final steady-state part of the Markovian process of failure events and that this part is the most conservative. In other words, one does not have to address the transitional Markovian process in practical engineering applications. The model can be used in the technical diagnostics effort [54].

The following ten major (governing) principles (“commandments”) reflect the rationale behind the PDfR concept:

1) PDfR concept is an effective means for improving the state-ofthe- art in the field of the microelectronic reliability engineering by quantifying, on the probabilistic basis, the operational reliability of the product;
2) The probability of failure of an electronic product is never zero, but could and should be assessed (quantified) and brought to an acceptable (adequate) level;
3) The best electronic product should be considered, designed and fabricated as the best compromise between the needs for its reliability, cost effectiveness and time-to-market;
4) Electronic product’s reliability cannot be low, need not be higher than necessary, but has to be adequate for the given application, considering the projected lifetime, environmental conditions and consequences of failure;
5) Redundancy, trouble-shooting and maintenance are important factors to be considered, when adequate reliability level has to be maintained, especially if the “genetic health” of the product is not high, even when the appropriate burn-in procedure is carried out;
6) When reliability for whatever application is imperative, the ability to quantify it is a must, especially if one intends to optimize and to assure reliability;
7) One cannot design a product with quantified, optimized and assured reliability by limiting the effort to the widely used today highly accelerated life testing (HALT): a cost-effective and highlyfocused FOAT is always a must;
8) Reliability is conceived at the design stage and should be taken care of, first of all, at this stage. It is at the design stage, when an attempt should be made to create a “genetically healthy” product 9) Highly cost-effective and highly focused FOAT geared to a limited number of pre-determined simple, easy-to-use and physically meaningful predictive reliability models and aimed at understanding the physics of failure that is anticipated and quantified by these models is an important constituent part of the PDfR effort;
10) PM, not necessarily using the well known FOAT models, is another important constituent of the PDfR approach. PM, in combination with well-established FOAT models (Arrhenius, Eyring, etc.), is a powerful means to carry out, if necessary, SA, with an objective to quantify and practically nearly eliminate failures (“the principle of practical confidence”).

Possible Next QT Generation

The next generation of ED QT could be viewed as a “quasi- FOAT,” “mini-FOAT”, a sort-of an “initial stage of FOAT” that more or less adequately replicates the initial non-destructive, yet full-scale, stage of FOAT. The duration and conditions of such a “mini-FOAT” QT could and should be established based on the observed and recorded results of the actual FOAT, and should be limited to the stage when no failures, or a predetermined and acceptable small number of failures in the actual full-scale FOAT, were observed. PHM technologies (“canaries”) could and should be concurrently tested to make sure that the safe limit is established correctly and is not exceeded. Such an approach to qualify devices into products will enable the industry to specify, and the manufacturers, including biomedical field, to assure, a predicted and adequate probability of non-failure (safety factor) for a product that passed the QT and is expected to be operated in the field under the given conditions for the given time. FOAT should be thoroughly designed, implemented, and analyzed, so that the QT is based on the trustworthy experimental data. Since FOAT cannot do without simple, easy-to-use and physically meaningful PM, the role of such modeling, both computer-aided and analytical (mathematical), in making the suggested new approach to QT practical and successful. It is imperative that the reliability physics that underlies the mechanisms and modes of failure is well understood. Such an understanding can be achieved only provided that flexible, powerful and effective PDfR efforts are implemented.

Three-Step Concept (TSC)

When encountering a particular reliability problem at the design, fabrication, testing, or an operation stage of a product’s life, and considering the use of predictive modeling to assess the seriousness and the likely consequences of the a detected failure, one has to choose whether a statistical, or a physics-of-failure-based, or a suitable combination of these two major modeling tools should be employed to address the problem of interest and to decide on how to proceed. A three-step concept (TSC) [52] is suggested as a possible way to go in such a situation. The classical statistical Bayes formula can be used at the first step in this concept as a technical diagnostics tool. Its objective is to identify, on the probabilistic basis, the faulty (malfunctioning) device(s) from the obtained signals (“symptoms of faults”). The physics-of-failurebased BAZ model and particularly its multi-parametric extension can be employed at the second step to assess the RUL of the faulty device(s). If the RUL is still long enough, no action might be needed; if it is not, corrective restoration action becomes necessary. In any event, after the first two steps are carried out, the device is put back into operation (testing), provided that the assessed probability of its continuing failure-free operation is found to be satisfactory. If an operational failure nonetheless occurs, the third step should be undertaken to update reliability. Statistical beta-distribution, in which the probability of failure is treated as a random variable, is suggested to be used at this step. While various statistical methods and approaches, including Bayes formula and beta-distribution, are well known and widely used in numerous applications for many decades, the BAZ model was introduced in the microelectronics reliability area only several years ago. Its attributes and use are addressed and discussed therefore in some detail. The suggested concept is illustrated by a numerical example geared to the use of the highly popular today prognostics-andhealth- monitoring (PHM) effort in actual operation, such as, e.g., en-route flight mission.

Numerical Examples

FOAT and BAZ Model

Let, e.g., the following input FOAT based information is obtained: 1) After t1 = 35h of testing at the temperature T1 = 60° C = 333° K, the voltage V=600V and the relative humidity H=0.85, 10% of the tested modules exceeded the allowable (critical) level of the leakage current of I* = 3.5μA and, hence, failed, so that the probability of non-failure is P1 = 0.9; 2) After t2 = 70h of testing at the temperature T2 = 85° C = 358° K at the same voltage and the same relative humidity, 20% of the tested samples reached or exceeded the critical level of the leakage current and, hence, failed, so that the probability of non-failure is P2 = 0.8. Then the equation (A-4) results in the following equation for the leakage current sensitivity factor γI:

This equation has the solution 4893.2 1( ) 1 I γ = h− μ A − Thus, 1 * 17126.2 Iγ I = h− . A more accurate solution can be always obtained by using Newton iterative method for solving transcendental equations. This concludes the first step of testing. At the second step, tests at two relative humidity levels H1 and H2, were conducted for the same temperature and voltage levels. This leads to the relationship:

Let, e.g., after t1 = 40h of testing at the relative humidity of H1 = 0.5 at the given voltage (say, V=600V) and temperature (say, T2 = 85° C = 358° K), 5% of the tested modules failed, so that P1 = 0.95, and after t2 = 55h, of testing at the same temperature and at the relative humidity of H2 = 0.85, 10% of the tested modules failed, so that P2 = 0.9. Then the above equation for the γH value, with the Boltzmann constant k = 8.61733x10−5eV / K, yields: 0.03292 H γ = eV . At the third step, FOAT at two different voltage levels V1 = 600V adn V2 = 1000V At the third step, FOAT at two different voltage levels, T = 850C = 3580K and h = 0.85, and it has been determined that 10% of the tested devices failed after t1 = 40h of testing (P1 = 0.9) and 20% of devices failed after t2 = 80h of testing (P2 = 0.8). The v factor

After the sensitivity factors of the leakage current, the humidity and the voltage are found, the stress free activation energy U0 can be determined on the basis of the last equation in Appendix A for the given temperature and for any combination of loadings (stimuli):

The third term in this result (the last term in the last equation in Appendix A) plays the dominant role, so that, in approximate evaluations, only this term could be considered. Calculations indicate that the loading free activation energy in the above numerical example (even with the rather tentative, but still realistic, input data) is about 0 U = 0.5eV. This result is consistent with the existing experimental data. Indeed, for semiconductor device failure mechanisms the activation energy ranges from 0.3eV to 0.6eV, for metallization defects and electro-migration in Al it is about 0.5eV, for charge loss it is on the order of 0.6eV, for Si junction defects it is 0.8eV. The distribution (A-4) yields:

TSC: BAZ Model Sandwiched Between Bayes Formula and Beta-Distribution

The objective of the numerical example below is to illustrate how the suggested TSC can be used to assess and to maintain high probability of non-failure in actual MED operation conditions.

Step 1. Application of Bayes formula as suitable technical diagnostic tool

The application of the Bayes formula enables one to assess the reliability of a particular malfunctioning device from the available general information for similar devices. It has been established, e.g., from experience with the given type of devices subjected in actual operation conditions to elevated temperature and vibrations that 90% of the devices do not typically fail during operation. It has been established also that the diagnostic symptom - an increase in temperature by 20ºC above the normal (specified) level - is encountered in 5% of the devices. The PHM technical diagnostics instrumentation has detected in a particular device the following two deviations (“symptoms of failure”) from normal operation conditions: 1) increase in temperature by 20ºC at the heat sink location (symptom S1) and 2) increase in the vibration power spectrum by (symptom S2). These symptoms might be due to the malfunction of the heat sink (state D1) and/or of the vibration protection equipment (state D2). From the previous experience with similar device at similar operation conditions it has been established that the symptom S1 (increase in temperature) is not observed at normal operation condition (state D3), and the symptom S2 (increase in the power of the vibration spectrum) is observed in of the cases (devices). It has been established also, based on the accumulated experience with this type of devices, that of them do not fail during the specified time of operation, of the devices experience the malfunction of the heat sink (state D1), and 15% of the devices are characterized by the state D2 (malfunction of the vibration protection equipment). Finally, it has been established that the symptom (increase in temperature) is encountered in the state D1(because of the malfunction of the heat sink) in of the devices, and in the state D2 (because of the malfunction of the vibration protection system) - in of the devices; and that the symptom S2 (increase in the power of the vibration spectrum) is encountered in the state D1 (malfunction of the heat sink) in of the devices and in state (malfunction of the vibration protection system) - in of the devices. The above information can be summarized in the form of the diagnostics matrix shown in Table 2. Thus, this matrix indicates that 1) the symptom (increase in temperature) is encountered in 20% of the cases because of the malfunctioning heat sink (state D1), in 40% of the cases because of the malfunctioning vibration protection system (state D2), and is never observed in normal operation conditions (state D3); 2) the symptom S2 (increase in the power of the vibration spectrum) is encountered in 30% of the cases because of the malfunctioning heat sink (state D1), in 50% of the cases because of the malfunctioning vibration protection system (state D2), and in 5% of the cases in normal operation conditions (state D3); and 3) the symptom S3 (both heat transfer and vibration protection hardware work normally) is encountered in 5% of the cases because of the malfunctioning heat sink (state D1), in 15% of the cases because of the malfunctioning vibration protection system (state D2), and in 80% of the cases in normal operation conditions (state D3).

Let us determine first that the probability that the device, in which the 20ºC increase in temperature has been detected, is still sound. This can be done using the information that 90% of the devices of the type of interest do not typically fail during the designated time of operation and that the symptom S1, which is an increase in temperature by 20ºC above the normal level, is encountered in 5% of these devices. The first message tells that the probabilities of the sound condition D1 and the faulty condition D2 in the general population of the devices under operation are P (D1) = 0.9 and P (D2) = 0.1, respectively. The second message tells that the conditional probabilities reflecting the actual situation with the given device are P (S/D1) = 0.05 and P (S/D2) = 0.95: only 5% of the devices function adequately, and 95% of them do not. The question asked is as follows: with this new information about a particular device, how did the expected probability P (D1) = 0.9 that the device of interest is still sound has changed? In other words, how could one use the accumulated experience about the operational performance of the large population of this type of devices, considering the results of the actual field information for a particular device?

The Bayes formula yields:

Thus, the probability that the device is still sound has decreased dramatically, from for the typical (expected) situation to as low as because of the detected 20ºC increase in the observed temperature and because such an increase is viewed as a failure of the device. The Bayes formula indicates, particularly, that the factor χ defined by the formula (A-5) accounts for the change, based on the updated reliability information, in the initial probability that the device is still sound and, hence, its use could be continued with a high level of confidence. The decrease in the probability of non-failure would be much different, if only a slight decrease in the probability of non-failure for the given device, based on the obtained symptom, is detected. Indeed, with P(S/D1) = 0.85 (instead of 0.05) and P(S/D2) = 0.15 (instead of 0.95), the factor χ would be as high as χ = 0.9808, and the updated probability of non-failure would be also high: P(D1/S) = 0.8827 Let us address now, using the information provided by the Table 1, the performance of a device because of the possible malfunction of the heat sink and/or the vibration protection system. The probabilities of the device states, when both symptoms, S1 (faulty heat sink) and S2 (inadequate vibration protection), have been detected, can be found using Bayes formula as follows:

This is the probability that the device, for which both symptoms, malfunctioning heat sink and malfunctioning vibration protection system, have been detected, is in the state D1, i.e., failed because of the malfunctioning heat sink. Similarly, one could find the probability P (D2/S1S2) = 0.91 that the device is in the state D2, i. e., failed because of the malfunctioning vibration protection system. Since the device has failed, it cannot be in the non-failure state D3, and therefore the probability that the device is still sound, despite the detected malfunctions of the heat sink and the vibration protection system, is zero: P(D3/S1S2) = 0. Let us determine the probability of the device’s state if the PHM measurements have indicated that there was no increase in temperature (the symptom S1 did not take place), but the symptom S2 (increase in the power spectrum of the induced vibrations) was detected. The absence of the symptom S1 means that the symptom 1 S of the opposite event took place, so that P(S1 / Di ) =1− P(S1 / Di ) . Changing the probability P(S1/Di) Changing the probability P (S1/Di) in the above diagnostics matrix for 1 ( / )i P S D we find the following probability of the state D1 of the device (the device failed because of the malfunctioned heat sink):

Similarly, we obtain: 2 1 2 P(D / S S ) = 0.46; 3 1 2 P(D / S S ) = 0.41. Determine now the probabilities of the device states when none of the symptoms took place. We find:

Similarly, we have: 2 1 2 3 1 2 P(D / S S ) = 0.05;P(D / S S ) = 0.92. Thus, when both symptoms, S1 and S2 are are observed, the state D1 (failure occurred because the heat sink is malfunctioning) has the probability of occurrence of 0.91. When none of these symptoms are observed, the normal state, D3, is characterized by the probability and, hence, is somewhat more likely to occur than the state, when both symptoms, S1 and S2, are observed. When the symptom S1 (elevated temperature) is not observed, while the symptom S2 (elevated vibrations) is, the probabilities of the states S2 (vibration protection system is not working properly) and S3 (both heat transfer and vibration protection hardware work normally) are 0.46 and 0.41 respectively. One could either accept this information and act accordingly, i.e., go ahead with a conclusion that it is the elevated temperature and not the elevated vibration that should be taken care of, or, since these probabilities are close, one might decide on seeking additional information. Such an information could be based on generated additional observations and/ or should use other sources to obtain more accurate and more convincing diagnostics information (e.g., modeling or additional measurements). Thus, the first step of the TSC enables one to identify, on the probabilistic basis, the malfunctioning device(s) and the most likely cause(s) that have resulted in the device failure. The objective of the next step is to assess, using BAZ equation, the RUL of the detected the malfunctioning device(s).

Strep 2: Application of BAZ Equation to Predict the RUL and the Corresponding Probability

Assume that FOAT has been conducted at the design stage with an objective of determining the process parameters anticipated by the BAZ model, and that the first stage tests have been carried out at two temperature levels T1 and T2, with the temperature ratio of T1/T2 = 0.95 and the recorded time-to-failure ratio t1/t2 = 1.5, until, say, half of the population of the devices failed: Q1 = Q2 = 0.5. Then the equation (A-5) results in the following equation for the sought dimensionless time τ1 = τ0/t1:

This equation has the following solution:

obtained by the trial-and-error (interpolation) technique. If Newton’s formula is used, by putting, e.g., 4 0 τ =10− as the initial (zero) approximation and using the well-known Newton’s recurrent formula to compute higher approximations, we obtain:

The latter result agrees well with the result obtained using trialand- error technique. Let the FOAT has been conducted at the temperature of T = 450° K at two stress levels with the stress ratio of, say, 2 1 1.2 σ σ = . Testing is run until half of the population failed Q1= Q2 = 0.5, and the recorded time ratio, when failures occurred, has been t1/t2 = 1.5. In this example it is assumed that the time constant τ0 in the BAZ equation is known from the previous FOAT. With this constant known, we calculate the τ0/t1 ratio for the new time t1. Let this ratio be, say, 0 4 1 4.0x10 t τ − = Then the loading σ1 related energy is as follows:

and the temperature T related energy kt is

The ratio of the loading related energy to the temperature-related energy is therefore

This ratio will be larger for larger loadings and lower temperatures. The ratio of the stress-free activation energy to the thermal energy can be determined as

Hence, the stress-free activation energy is

Let us assume that the FOAT-based and BAZ-based calculations carried out at the operation temperature of T = 90ºC = 363ºK have indicated that the time factor is τ0 = 10-4 sec; the ratio of the stress-free activation energy to the temperature-related energy is U0 30.0; kT = and the ratio of the stress-related energy to the thermal energy is 1.0. kT γσ = Then the BAZ formula (A-1) results in the following projected lifetime:

This time decreases to

for the 20% increase in the power of the vibration spectrum and is only

in the case of the 20ºC increase in temperature. Thus, the increase in temperature should be in this example of a greater concern than the increase in the vibration response (in the output vibration spectrum). Also, based on the Bayes formula prediction, the malfunction of the device due to the increased temperature is more likely than because of the faulty vibration protection system.

Thus, the output of this TSC stage is the assessed, on the probabilistic basis, the RUL of the device(s). As has been indicated in the abstract, if the assessed RUL time is still long enough, no action might be needed, if not -corrective restoration action becomes necessary. In any event, after the first two TSC steps have been carried out, the devices are put back into operation, provided that the assessed probability of their continuing failure-free operation is found to be satisfactory. If failure nonetheless occurs, the third step should be undertaken to update the predicted reliability. Statistical beta-distribution, in which the probability of failure is treated as a random variable, is suggested to be used at the third step.

Step 3. Application of Beta-distribution (BD) to Update Reliability Information

Let the performance of five “suspicious” (malfunctioning) devices is monitored, and one of them failed. Let us determine the beta-distribution characteristics for four successes and one failure (α = 4,β =1)  We have: α =α +1 = 5,β = β +1 = 2,  and the characteristics of the beta-distribution are:

With α  β (there are more successes than failures), the distribution skews to the higher probabilities of non-failure, and the mode (the maximum value, of the probability density function) is higher than the mean value and the median. Let no failures have been observed after the first two TSC steps have been carried out. Let us determine the expected number of successes (non-failures) as a function of the probability of non-failure. Assuming zero failures (β = 0,β =1) we have: 2 1 1 p p α − = −     If the mean value of the probability of non-failure is  p = 0.7143, then α =1.5002. Since α value has to be expressed by an integer, one should assume either α =1 (α = 2) , or α = 2 (α = 3) . Then we obtain the following characteristics of the distribution:

when α = 3,β =1. In both cases, it is a triangular distribution: the mode remains the same, and is at p=1. The mean and the median increase in the caseα = 3,β =1 in comparison with the case α = 2,β =1 because of a larger number of successes. The variance reduces, because of the improved information, and the skewness (shift in the direction of higher probabilities of non-failure) and the kurtosis (“peakedness”) of the distribution increase. Assume now that the predicted (anticipated) probability of non-failure is as high as  p = 0.95. Despite such a high probability of nonfailure, the product exhibited nonetheless a field failure. Let us determine, based on this additional information, the revised (updated) estimate of the actual operational probability of non-failure. Assuming that the anticipated (projected) number of failures was zero (β = 0,β =1)  prior to putting the device(s) into operation, and using the formula for the numberα of anticipated nonfailures from the previous example, we obtain, with  p = 0.95, that 2 1 0.90 18. 1 0.05 p p α − = = = −     For the new posterior failure, with

α =18(α =19) and β =1  (β = 2) , the characteristics of the probability of non-failure are

Thus, because of the occurrence of the unexpected failure, the actual probability of non-failure of the product is only 90.45%, and not 95%. Note that this result, obtained assuming a 95% nonfailure level, indicates that after the first failure has occurred, as many as nineteen additional continuous non-failures (successes), i.e., 18+19=37 successes and 1 failure, would have to be recorded (observed) in order to return the device’s dependability (probability of non-failure) to its original specified estimate of 95%. We addressed above a situation where one failure has occurred. Let us examine a situation with two failures. In this case one should put α =18 (α =19) and β = 2 (β = 3) and the characteristics of the BD for the probability of non-failure become as follows:

Thus, the operational probability of non-failure reduced by about 9.12%, compared to the projected probability of 95%, and by an additional 4.52% with respect to the situation with a single failure. The mean, the median and the mode have also reduced, and because of the higher number of failures, the variance has increased, and the skewness and the kurtosis have decreased.

Acronyms

ALT=Accelerated Life Testing
AT=Accelerated testing
BAZ=Boltzmann-Arrhenius-Zhurkov (model)
BD=Beta Distribution
DfR=Design for Reliability
FEA=Finite Element Analysis
FOAT=Failure Oriented Accelerated Testing
HALT=Highly Accelerated Life Testing
QT=Qualification Testing
MED=Medical Electronic Device
MTTF=Mean Time to Failure
PDfR=Probabilistic Design for Reliability
PHM=Prognostics and Health Management
PM=Predictive Modeling
PPM=Probabilistic Predictive Modeling
RUL=Remaining Useful Lifetime
SA=Sensitivity Analysis
SoF=Symptoms of Faults
TSC=Three-Step Concept

Conclusion

The application of the PDfR concept, FOAT and the multi-parametric BAZ model enables improving dramatically the state of the art in the field of the aerospace ED reliability prediction and assurance.

References

Suhir E, Mahajan R. Are Current Qualification Practices Adequate?. Circuit Assembly, 2011.

Suhir E. Assuring Aerospace Electronics and Photonics Reliability: What Could and Should Be Done Differently. IEEE Aerospace Conference, Big Sky, Montana, March, 2013.

Suhir E. What could and should be done differently: failure-oriented-accelerated- testing (FOAT) and its role in making an aerospace electronics device into a product. Journal of Materials Science: Materials in Electronics. 2018 Feb 1; 29(4): 2939-48.

Suhir E. Could electronics reliability be predicted, quantified and assured?. Microelectronics Reliability. 2013 Jul 1; 53(7): 925-36.

Suhir E, Bensoussan A. Quantified reliability of aerospace optoelectronics. Int. J. Aerosp. 2014; 7(1).

Suhir E, Rafanelli AJ. Applied probability for engineers and scientists. McGraw-Hill, New York, 1997.

Suhir E. Thermal stress modeling in microelectronics and photonic structures and the application of the probablistic approach: Review and extension. The International journal of microcircuits and electronic packaging. 2000; 23(2):215-23.

Suhir E. Probabilistic Design for Reliability. Chip Scale Reviews. 2010; 14(6).

Suhir E, Nicolics J, Yi S. Probabilistic predictive modeling (PPM) of aerospace electronics (AE) reliability: prognostic-and-health-monitoring (PHM) effort using Bayes formula (BF), Boltzmann-Arrhenius-Zhurkov (BAZ) equation and beta-distribution (BD). InBoltzmann-Arrhenius-Zhurkov (BAZ) equation and betadistribution (BD), EuroSimE Conf, Montpelier, France 2016.

Suhir E. Probabilistic Design for Reliability of Electronic Materials, Assemblies, Packages and Systems: Attributes, Challenges, Pitfalls. InMMCTSE 2017, Cambridge, UK. 2017.

Suhir E. Aerospace electronics reliability prediction: application of two advanced probabilistic techniques. ZAMM‐Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik. 2018 May; 98(5): 824-39.

Suhir E, R Ghaffarian. Constitutive Equation for the Prediction of an Aerospace Electron Device Performance-Brief Review. Aerospace. 2018; 74(4).

Suhir E. Considering electronic product’s quality specifications by application (s). Chip Scale Reviews. 2012 Jul; 16(4).

. Khatibi G, Czerny B, Magnien J, Lederer M, Suhir E, Nicolics J. Towards adequate qualification testing of electronic products: Review and extension. In2014 IEEE 16th Electronics Packaging Technology Conference (EPTC). 2014 Dec 3; 186-191.

Suhir E. Accelerated life testing (ALT) in microelectronics and photonics: its role, attributes, challenges, pitfalls, and interaction with qualification tests. J. Electron. Packag. 2002 Sep 1; 124(3): 281-91.

Suhir E. Reliability and Accelerated Life Testing. Semiconductor International, 2005.

Suhir E, R Ghaffarian. Electron Device Subjected to Temperature Cycling: Predicted Time-to-Failure. Journal of Electronic Materials. 2019; 48(2): 778-9.

Suhir E. Analysis of a pre‐stressed bi‐material accelerated‐life‐test (ALT) specimen. ZAMM‐Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik. 2011 May 3; 91(5): 371-85.

Suhir E, Nicolics J. Analysis of a Bow-Free Prestressed Test Specimen. Journal of Applied Mechanics. 2014 Nov 1; 81(11).

Suhir E. To Burn-In, or Not to Burn-In: That’s the Question. Aerospace. 2019 Mar; 6(3): 29.

Suhir E. Burn-in: When, For How Long and at What Level?. Chip Scale Reviews. 2019.

Suhir E. Is Burn-in Always Needed?. Int. J. of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2020; 9(1).

Suhir E. For How Long Should Burn-in Testing Last?. Journal of Electrical & Electronic Systems (JEES). 2019; 8(2).

Suhir E. Statistics-related and reliability-physics-related failure processes in electronics devices and products. Modern Physics Letters B. 2014 May 30; 28(13): 1450105.

Suhir E, Bensoussan A. Degradation related failure rate determined from the experimental bathtub curve. InSAE Conference, Seattle, WA. 2015; 2224.

Suhir E, Ghaffarian R, Nicolics J. Could application of column-grid-array (CGA) technology result in inelastic-strain-free state-of-stress in solder material?. Journal of Materials Science: Materials in Electronics. 2015 Dec 1; 26(12): 10062-7.

Suhir E. Analysis of a short beam with application to solder joints: could larger stand-off heights relieve stress?. The European Physical Journal Applied Physics. 2015 Aug 1; 71(3): 31301.

Suhir E, Ghaffarian R. Predicted stresses in a ball-grid-array (BGA)/columngrid- array (CGA) assembly with a low modulus solder at its ends. Journal of Materials Science: Materials in Electronics. 2015 Dec 1; 26(12): 9680-8.

Suhir E, Ghaffarian R. Predicted stresses in a ball-grid-array (BGA)/columngrid- array (CGA) assembly with an epoxy adhesive at its ends. Journal of Materials Science: Materials in Electronics. 2016 May 1; 27(5): 4399-409.

Suhir E. Avoiding low-cycle fatigue in solder material using inhomogeneous column-grid-array (CGA) design. ChipScale Rev. 2016.

Suhir E. Bi-material assembly with a low-modulus-and/or-low-fabricationtemperature bonding material at its ends: optimized stress relief. Journal of Materials Science: Materials in Electronics. 2016 May 1; 27(5):4816-25.

Suhir E. Expected stress relief in a bi-material inhomogeneously bonded assembly with a low-modulus-and/or-low-fabrication-temperature bonding material at the ends. Journal of Materials Science: Materials in Electronics. 2016 Jun 1; 27(6): 5563-74.

Suhir E, Yi S, Ghaffarian R. How Many Peripheral Solder Joints in a Surface Mounted Design Experience Inelastic Strains?. Journal of Electronic Materials. 2017 Mar 1; 46(3):1747-53.

Suhir E, Ghaffarian R, Yi S. Solder material experiencing low temperature inelastic stress and random vibration loading: predicted remaining useful lifetime. Journal of Materials Science: Materials in Electronics. 2017 Feb 1; 28(4):3585-97.

Suhir E, Bechou L. Availability index and minimized reliability cost. Circuit Assemblies. 2013 Feb.

Suhir E. How Long Could/Should be the Repair Time for High Availability?. Modern Physics Letters B (MPLB). 2013 Aug 30; 27(12).

Suhir E, Salotti JM, Nicolics J. Required Repair Time to Assure the Given/ Specified Availability. Optics, Photonics & Sensors. 2020 Apr 18; 1.

Suhir E. Analytical Modeling in Structural Analysis for Electronic Packaging: Its Merits, Shortcomings and Interaction with Experimental and Numerical Techniques. ASME Journal of Electronic Packaging. 1989 Jun; 111(2).

Suhir E. Analytical stress-strain modeling in photonics engineering: its role, attributes and interaction with the finite-element method. Laser Focus World. 2002 May; 14: 611-5.

Suhir E. Analytical thermal stress modeling in physical design for reliability of micro-and opto-electronic systems: role, attributes, challenges, results. InMicro-and Opto-Electronic Materials and Structures: Physics, Mechanics, Design, Reliability, Springer, Boston, MA. 2007; B3-B21.

Suhir E. Analytical thermal stress modeling in electronics and photonics engineering: Application of the concept of interfacial compliance. Journal of Thermal Stresses. 2019 Jan 2; 42(1): 29-48.

Suhir E. Application of Analytical Modeling in the Design for Reliability of Electronic Packages and Systems, Springer. 2019.

Suhir E. Analytical modeling enables explanation of paradoxical behaviors of electronic and optical materials and assemblies. Advances in materials Research. 2017 Jun 1; 6(2):185.

Suhir E. Failure-oriented-accelerated-testing (FOAT) and its role in making a viable IC package into a reliable product. Circuits Assembly, July. 2013.

Suhir E. Failure-oriented-accelerated-testing (FOAT), boltzmann-arrheniuszhurkov equation (BAZ) and their application in microelectronics and photonics reliability engineering. Int J Aeronaut Sci Aerospace Res. 2019; 6(3): 185-91.

Suhir E, Bensoussan A, Nicolics J, Bechou L. Highly accelerated life testing (HALT), failure oriented accelerated testing (FOAT), and their role in making a viable device into a reliable product. In2014 IEEE Aerospace Conference, Big Sky, Montana 2014.

Suhir E, Bensoussan A. Application of multi-parametric BAZ model in aerospace optoelectronics. In2014 IEEE Aerospace Conference, Big Sky, Montana 2014.

Suhir E. Failure-Oriented-Accelerated-Testing and its Possible Application in Ergonomics. International Journal. 2019; 3(2).

Zhurkov SN. Kinetic concept of the strength of solids. International Journal of Fracture Mechanics. 1965 Dec; 1:311-23.

Suhir E, Bechou L, Bensoussan A. Technical Diagnostics of Electronics Products: Application of Bayes Formula and Boltzmann-Arrhenius-Zhurkov (BAZ) Model Print E-mail. 2012.

Suhir E, Kang SM. Boltzmann–Arrhenius–Zhurkov (BAZ) model in physics-of-materials problems. Modern Physics Letters B. 2013 May 30; 27(13):1330009.

Suhir E. Three-step concept in modeling reliability: Boltzmann–Arrhenius– Zhurkov physics-of-failure-based equation sandwiched between two statistical models. Microelectronics Reliability. 2014 Oct: 2594-603.

Suhir E. Boltzmann-Arrhenius-Zhurkov Equation and Its Applications In Electronic-and-Photonic Aerospace Materials Reliability-Physics Problems. Int. Journal of Aeronautical Science and Aerospace Research (IJASAR). 2020; 24.

Suhir E, Bechou L, Bensoussan A. Technical Diagnostics in Electronics: Application of Bayes Formula and Boltzmann-Arrhenius-Zhurkov (BAZ) Model. 2012.