Heisenbugs; the Dark Side of Debugging

Some of the most difficult software bugs to track down are those that disappear when you try to study them… Heisenbugs! For example, when an issue is reproducible in a Release build, but disappears when you run the software in the Debugger.

There are many possible explanations for this which include differences in memory initialisation, timing, code optimisation, code and memory layout between Debug and Release builds. I won’t elaborate on the details here, as they are already very well covered on stack overflow.

The reason for this post is that we recently encountered an issue in some code which has the following behaviour:

  1. Has a reproducible bug in release mode (not an outright crash or exception, but fails to behave as expected).
  2. Runs perfectly well when started in the debugger, with no issues.
  3. When started without debugging, and the debugger is attached later, several first chance exceptions are seen.

My theory was that the exceptions weren’t being triggered when started from the Debugger because the memory initialisation (or other Debug/Release differences) were preventing the exceptions from occurring. However, there was some debate on this point, with someone else suggesting that attaching the Debugger later was causing spurious exceptions. Their suggestion was that if these exceptions were genuine, they would also occur when started in the Debugger.

I’ve seen this debate crop up frequently enough, that I thought it would be useful to create a very simple example to demonstrate an exception that reproduces in Release, but works fine in Debug. Also, to confirm whether the same exception would reproduce when the Debugger is attached to a running process.

Here is a first attempt at creating a very small Heisenbug demonstrator in C++, which has a deliberate access violation:

#include <iostream>

void main() {
    int *x = new int[3];
    if (x[1] != x[2]) delete [] x;
    std::cin.get();
    delete [] x;
}

When a Release build is run in Visual Studio 2013, it demonstrates the following behaviour:

  1. When started without the Debugger and a key is pressed, it causes an exception.
  2. When started under the Debugger and a key is pressed, it exits cleanly and works fine.
  3. When started without the Debugger, and the Debugger is later attached and a key is pressed, it causes an exception.

If you run this a few times, you might find that Windows enables the “Fault Tolerant Heap” for the executable, and then you won’t be able to reproduce the exceptions. The solution is to disable the Fault Tolerant Heap for your executable, by going to these keys:

HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers\heisenbug.exe

HKCU\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers\heisenbug.exe

Then delete the Fault­Tolerant­Heap entry on both of these keys. In my case, only the HKLM version was needed.

Returning to the code to explain how this works, here’s a commented version, explaining each line:

// create an array of 3 integers, without initialisation
int *x = new int[3];

// compare elements 1 and 2 (which are not initialised)
// if they are different, delete the array and leave
// the invalid pointer in x
if (x[1] != x[2]) delete [] x;

// wait for a key press
std::cin.get();

// delete the array (potential access violation)
delete [] x;

There are two intentional bugs in this particular line of code:

if (x[1] != x[2]) delete [] x;

The first bug is that we are accessing uninitialised memory, because the array x has not beeen initialised. When we compare x[1] and x[2] the result is unpredictable. In a Release build there’s a good chance (but not absolute certainty) that these values will be different, and it will delete the array. Second, after deleting the array, the variable x still contains the (now invalid) pointer address. Later on, when this final line of code executes, it will access an invalid pointer, usually causing an access violation exception:

delete [] x;

However, when running in the Debugger, it’s very likely that the memory will have been initialised by the Debugger, and x[1] and x[2] will usually contain the same value. This prevents the pointer from being deleted, and avoids the access violation. The code runs without issue when being debugged.

This serves as a very simple example of a bug which only appears in Release, and disappears in the Debugger, due to accessing uninitialised memory (as mentioned above, this is only one of many possible causes of such an issue).

One thought on “Heisenbugs; the Dark Side of Debugging”

Leave a Reply

Your email address will not be published. Required fields are marked *