A few years ago, we were buying a new mobile router for my family. After the purchase, we quite soon realised that the router had some reliability problems, and it stopped working properly, when we had used it only for a very short time. The router worked okay after we restarted it, but every time it had been on for a while, it stopped working properly.
So, we took the router back to the store, where the clerk promised that they would check what was wrong. I was pretty sure that we had a device with an intermittent failure. Typical for such failures is that they occur only in suitable conditions and can cause very strange failure modes. Unfortunately, after service, the router did not work any better. Hence, we took it back to the store where it became clear they had done nothing, even though we had explained the problem. They had turned the router on, checked that it worked okay and decided that the router was fine. They were sure that the problem was in our system not in the router.
We repeated the service cycle again. However, after this the service clerk told me that there is not such thing as an intermittent failure, and this is not their problem. Luckily, we managed to return the router, go to another store, and buy a new one which worked perfectly without any failures.
I must admit that I was somewhat shocked by the service clerk’s attitude. Firstly, of course we got really bad service, but I was surprised that he had never had a device before with such issues and declared that they were not possible. Considering that intermittent failures are quite common in electronics, this was quite surprising. An intermittent failure basically means a failure which comes and goes but is not permanent. Since electronics devices are very complex, there are plenty of potential reasons for such a failure to occur.
Intermittent failures may appear, for example, due to cracks, relaxation of plastics, migration of materials, corrosion, and bad connectors. In testing we commonly see these failures when a crack occurs in a suitable interconnection. A crack in a solder joint may lead to a situation, in which the joint is closed at high temperatures, but bending due to temperatures below zero opens the joint, and the device does not function as it should. Even more common is that during thermal cycling testing the interconnection is closed at high and low temperatures, but during the change of temperatures the interconnection is open, and the device does not work. In the picture below an example of a cross-sectioned cracked solder joint is shown.
In Trelic, we do lots of different accelerated life tests which mean we expose test structures, components, or whole devices to different environmental conditions. Quite often a test with fluctuating test conditions is involved. During testing we prefer to measure the functionality or the electrical signals of tested components and devices in-situ. This way we can see in real-time how failures occur and are they permanent or not i.e. can we see the failure both during and after the test.
Quite often we see failures which occur only at low or high temperature or at high humidity. Even though the samples show a failure during the test, after testing they may function perfectly fine, and no indication of any failures is seen. It is not uncommon that in these situations our customer mentions that such failures have also be seen in real use conditions as intermittent failures and they have not been able to determine the reason for them. In this case it is of course great that we have been able to imitate the failures because we can then start to analyse what is causing them. One of the main problems with intermittent failures is that their location and cause is difficult to define. Additionally, they may be very hard to replicate. For example, as shown in the picture below, electronics have lots of components with different polymers and plastics, which makes them vulnerable to intermittent failures due to humidity and such failures may typically be difficult to find.
An intermittent failure can also be a real challenge for in-depth failure analysis (read more about failure analysis here), sometimes even a nightmare, as the heading says. When a failure occurs, for example, only at very high humidity and temperature, it is typically not seen at room conditions. However, it is difficult to study a device at such high humidity and temperature conditions to determine the exact location of the failure. Furthermore, detailed failure analysis techniques can usually be only used at room conditions and it is possible, that because of this, it is impossible to confirm the reason for the failure. In the picture below a corroded copper plated via is shown. Corrosion may be one of the reasons causing failures which are difficult to locate or cause instability to connections.
Unfortunately, intermittent failures are common in electronics and cause problems in reliability analysis. Nowadays, there is also a risk that such failure is caused by combined hardware and software problems, which makes it even harder to analyse their reasons and locate the original cause. As I mentioned before, one way to analyse and find such failures is to measure the functionality of the device and age them in various environmental conditions. If this is done already in product development, the risk of intermittent failures is significantly reduced. If failures are seen in the test conditions, they can be further studied, and the test conditions already give important clues about the problems and reasons for them. Building electrical set-ups for electrical measurements during environmental testing is sometimes time consuming, but typically well worth the extra information gotten.