Avoid your intuition, it’s probably wrong

Understanding your assumptions and thoroughly questioning them is crucial to building complex systems. I’ve found that the vast majority of mistakes I make in hardware are where I let my intuition develop an argument for why something is occurring. Intuition easily justifies the bad assumptions that thrive in our knowledge gaps.

A few months back a teammate and I ran into a bug where a sensor (IR photointerrupter) was counting way too many times when it was triggered. It was a particularly weird bug because we had another set of these already implemented and they didn’t have this problem. We started looking into the code to see if it was issue with how we were counting the interrupts.

After some careful examination of the software, we started looking at the physical hardware. The physical hardware seemed fine. The voltages were exactly as expected when the sensor was blocked and unblocked. I hooked up the outputs to a digital logic analyzer and blocked and unblocked the sensor. Sure enough, leading up to the actual pulse were lots of tiny pulses increasing in duty cycle. There were also tiny pulses trailing the actual pulse, decreasing in duty cycle. We went and checked our correctly functioning implementation and surprisingly the small leading and trailing pulses were there too! But for some reason they were being counted properly by the MCU. We were confused. So we attached analog leads to the output of our sensor. We blocked and unblocked it, expecting to see the false pulse train leading up to the actual pulse. Nope. Perfectly clean rising and falling edges on each of the implementations, exactly the same way. We were very confused.

This is painful to recount because the solution now appears so obvious in hindsight. We had evidence suggesting that our MCU was performing some strange counting of the rising and falling edges of our actual pulse. It was right in front of us. We had data showing that the duty cycle of the false triggers were increasing as the it came closer to the actual pulse. What were we missing?

We had made a major assumption that analog-to-digital conversion on the logic analyzer would be able to get a clean reading of the sensor output being pulled high and low. We assumed that the measurement tool we were using wouldn’t also be having false triggers. We were really thrown off the trail because we had another implementation of the same sensors functioning properly. All of this combined, we let our intuition grasp for answers rather than consider the facts.

—

After a bit of discussion, some forum reading, and more analysis of the signals we figured out the problem. The Schmitt Trigger implemented on both our logic analyzer and on our MCU were getting triggered by the rising and falling edges of the sensor being blocked and unblocked. This is a pretty common occurrence and it can usually be resolved by low-pass filtration in hardware right before the signal enters the MCU.

Schmitt Trigger Edge Detection

The reason the count on the other implementation appeared correct was because we were performing quadrature encoding with the sensors and we would increment in one direction and decrement in the other direction. The false pulses one on sensor offset the false pulses on the other since these were being used in conjunction. This is a little technical and very specific to our application but it basically masked the fact that the sensors had all the extra triggers because the end result always appeared correct.

Quadrature Encoder Waveforms

We were really confused by all the evidence presented to us simply because of the assumptions we were making. Multiple logical explanations could be presented but would conflict one another. Keep an eye out for these type of inconsistencies, as they’re a strong indicator of bad of assumptions.

— 10.22.2017

home

I avoid my intuition, it’s usually wrong