Bug Suspects
On a recent training flight, I was teaching the student about flight at a slow airspeed when our electrical systems began to malfunction. At first the radios lights flickered, and I thought it might be a problem with the radio. The Cessna 152 we were flying has a radio master switch so I turned that off to see if the issue went away. Soon the needles on the fuel gauges were swinging back and forth, indicating that the electrical issue was still present. If the radio master switch is off and there is no power to the radio, but the problem persists, it is clear that this is not a problem with the radio.
I was recently working on a bug that was reported by several customers in one afternoon. Another developer suggested that the bug was in a recent feature that seemed to be related. This feature included a change to some existing naming conventions so it made sense. However, upon closer inspection it became obvious that the problem must be somewhere else. Why? Because the file was last modified 17 days ago. Remember that we received several customer reports about this error in one afternoon. The problem could not be 17 days old or we would have heard about it before. We later found that a bad merge had caused a line of code to disappear in a feature pushed that afternoon. This, or course, was much easier to find after we ruled out several possible “suspects”.
Detective Work
Debugging is like being a detective. Like a detective we must gather evidence and find suspects. Once we have a bunch of bug suspects it is time to determine which one is the guilty party. At this point there are many possibilities and paths to follow in searching for the answer. Enormous amounts of time can be saved if we can rule out some of these bug suspects. A detective trying to catch a murderer doesn’t want to waste time on innocent people, so he will rule suspects out if possible to save time.
Let’s look at some things that can help to rule out suspected bad code:
-
o
- As mentioned above, the last modified date of the file.
- The last modified date of the suspected piece of code. If possible, drill down to the line of code that seems to be the problem and see when it was last changed. The blame feature in github works great for this.
- The flow of the code. Does this line of code get called in the configuration used when the error occurs? Sometimes a bug suspected to be in commonly used code is caused by some custom code that only affects one particular configuration. In these cases you can rule out the common code if it gets passed over or replaced.
- Ask some witnesses. Print out a variable before and after the suspected piece of code and compare. If the bug is not present, then it must occur later in execution. If it is present in both variables it must occur earlier.
o
o
o
These are just a few examples, please comment with more.
Also remember that just like in a murder case, you can never completely rule out a bug suspect. For example, the file that was modified 17 days ago could contain a bug that doesn’t cause any malfunctions until the newer feature is launched with good code. However this is not as likely, so it is best to put these bug suspects aside until you are sure that your top suspects are not the problem.
As for my flight, the situation deteriorated into total electrical failure caused by a bad battery. Fortunately, an airplane’s electrical system is independent of the engine and the battery is not needed to keep the propeller spinning. Instead, the spark plugs are powered by two magnetos (like your lawnmower). So the engine kept running but the electrical power died while the flaps were fully extended and since they require electricity I could not retract them. This meant that I had to drag the plane back to the airport using almost full power.