Debugging is the method for finding and fixing bugs. Bugs are behaviours in a software system which are undesirable and counter to the programmer’s intent. You’d think, given the obvious importance of debugging in providing and maintaining quality software, that it’d be easy to find hundreds of good quality tutorials on the web to guide programmers through locating, and then fixing, bugs in the software we write.
But searching the Internet for “How to debug” gives a surprisingly small number of articles.
Many of these are based around debugging tools in specific languages or environments, and the rest are mostly concerned with increasing unit test coverage, or pre/post-condition asserts in live code, and so on. Those are all good ways to eliminate bugs, of course, but they’re very low-level, and many of them are really techniques to use during development, rather than when faced with an existing system exhibiting a bug. Too much inline assertion code will have an effect on performance, and sometimes pre- or post-conditions can’t be expressed in ways that are simple enough to be useful.
Some discuss finding a reproducible way to exhibit the bug, in as much as they say that you should. That’s certainly useful, but I worry about treating this as a prerequisite, because it means that bugs which are hard to reproduce are treated as either less important (“It works for me”), or else insoluble. Sometimes, we can’t see the customer’s data that causes the bug for confidentiality reasons. Sometimes, bugs only show in production environments we can’t control. Just because a bug doesn’t exhibit when we try it doesn’t make it any less frustrating to the customer.
It’s a well-known truism that everything is better with Bacon. This post attempts to describe a high-level method for locating a bug, fixing it, and verifying that the bug is fixed. It does so with the help of Bacon, obviously.
The steely-eyed amongst you will have noticed I capitalized Bacon, there – and that’s not because I really, really like bacon, though of course, like anyone even remotely sensible, I do. Instead, it’s because I’m not talking about succulent, smoked back bacon or even that poor streaky stuff found in the United States, but Sir Francis Bacon.
Bacon is credited with the concept of case law and precedents, a slew of scientific discoveries, and the creation of the scientific method, as well as being a keen proponent of experimental science itself (experimentalism having been reignited a little earlier, by one Roger Bacon). He reportedly died by catching pneumonia while studying the effects of freezing meat as a potential preservation technique, but I’m not advocating freezing yourself to death.
Instead, it’s the scientific method I’m interested in applying to debugging. The scientific method – sometimes known as the Baconian Method, which makes it sound very tasty – is a sequence of steps looped around until you finally figure things out:
Characterisation
This is the process of gathering as much knowledge as possible about the system’s behaviour surrounding the bug. The characterisation process often starts with a vague statement of behaviour, such as “Sometimes, the user isn’t notified about new posts”. What we need to do is narrow down the “sometimes” a bit. We might find out that this relates to a post-restart condition, or that it’s only certain posts, or certain users affected. During this process, it’s sometimes possible to end up with a concrete sequence of events for reproducing the bug; but it’s often not.
Hypotheses
The next step is to start coming up with potential reasons for the bug. “The system is unaware of the user being online, and therefore does not know to send notifications” might be one. It doesn’t matter if these turn out to be wrong, by the way, the important aspect si that you can figure out that they are wrong, ideally quite easily. This is done by adding Predictions, and then Testing them.
Predictions
Given a hypothesis, make some measurable prediction of behaviour. “If this is the problem, the table of online users will be incorrect.”
Tests
Given the prediction, develop a test which will demonstrate the validity of it, and therefore the hypothesis. It’s really nice when this can be done as a simple unit test, but sometimes you’ll need to add logging data, have the customer examine database tables, or use a debugging tool to examine internal program state. Amazingly often, though, testing a prediction can be done simply by reading through the code. “Yeah, we missed off this return value,” or “No, we’re definitely storing the user’s status correctly in all cases”.
Result
If you guessed your hypothesis correctly, and it was narrow enough, you can now fix the bug. Your predicted behaviour then changes, and you have the test required to validate it. If you’ve managed this first time, congratulate yourself on your amazing luck. Otherwise, you have new data to add to your characterization, and you can come up with a new hypothesis to work through – either narrower or a different idea altogether.
So remember next time you’re faced with an angry customer who swears your software is misbehaving that there is hope, because everything – even debugging – is better with Bacon.