How loop optimization in GCC uses undefined behaviour to make inferences

There are many ways to try and understand how an optimizing C compiler might transform your code.

Lets start by thinking about the goal of an optimizing compiler. Its goal is to manipulate the code that it is presented with, making modifications to cause it to execute more quickly and without altering the observable behaviour of a correctly formed program.

It is very important when describing the effects of an optimizer to remember that its only obligation it to preserve the behaviour of a correctly formed program. If your program is not correctly formed (normally expressed by compiler writers as relying upon undefined behaviour) then the optimizer is allowed to make modifications. That’s it! These modifications need not  preserve observable behaviour. They don’t have to execute more quickly. They don’t have to try to guess the programmer’s intent. The compiler is allowed to make modifications of more or less any form.

Over the years optimizing compilers have come to make inferences based on undefined behaviour. One of the simplest ones occurs when a pointer is dereferenced. In this case the compiler knows that the pointer cannot be NULL (because it has already been dereferenced and dereferencing a NULL pointer results in undefined behaviour). As a result it can the treat any later checks for pointer validity as unreachable code and remove it.

Recently GCC gained a much more powerful mechanism to track undefined behaviour that occurs due to out of range loop indicies. Having read about this and looked at a few spurious compiler bug reports I settled down and wrote the following plausible but buggy code.

Note: If you like puzzles then don’t scroll down past the return 0; statement and closing brace. That way you can try and spot the bug yourself. As far as I know there is only one in there!

int is_whitespace(char c)
{
    const char lookup_table[] = { ' ', '\t', '\n' };

    for (int i=0; i<=sizeof(lookup_table); i++)
        if (c == lookup_table[i])
            return 1;

    return 0;
}

Spotting the error is hopefully the easy part.

If you need a clue then be reassured that the lookup table contains three elements (it is initialized from character literals rather than a string literal so it does not get nil-terminated).

If you just want to know what the bug is so you can keep reading this article then have a look at the loop exit condition. It uses <= rather than < meaning the loop exit condition is reached after four cycles round the loop. This results in reading after the end of the array leading to undefined behaviour.

Now for the hard bit. What do you expect to happen when you run this code?

One answer, and one that I might have given had I been asked this question instead of writing it is: “strictly speaking this is undefined so it needs fixing, however the padding put in by the linker means that on most systems lookup_table[3] will evaluate to ‘\0’ and so the function will work more of less as intended (assuming the caller never cares whether the ‘\0’ character is whitespace or not)”.

That answer might even have been adequate two years ago. However with modern compilers it is better to rely on the simpler answer regarding what can happen: “anything”.

Nevertheless, even knowing all of the above the transformation the compiler actually makes may still yet surprise you. Unconditionally the code comes out as:

int is_whitespace(char c)
{
    return 1;
}

If you don’t believe me look at the assembler (generated using gcc-4.8.2 with -O2 -std=c99 -S on x86-64).

        .file   "g.c"
        .text
        .p2align 4,,15
        .globl  is_whitespace
        .type   is_whitespace, @function
is_whitespace:
.LFB0:
        .cfi_startproc
        movl    $1, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   is_whitespace, .-is_whitespace
        .ident  "GCC: (GNU) 4.8.2 20131212 (Red Hat 4.8.2-7)"
        .section        .note.GNU-stack,"",@progbits

What has happens is the compiler has realized that when i == 3 the code makes an undefined memory read. From this the compiler infers that i must always remain strictly less than three during execution of the function and that, for this reason, the end condition is unreachable. It can be optimized away and, because this change means the loop can never exit then we also know that the function can only return 1 so we can get rid of the loop itself as well.

As a further exercise imagine what happens if the loop contains a function call that is opaque to the optimizer (meaning that the compiler doesn’t know what side effects the function has and therefore cannot remove it). Now we still know that the end condition is unreachable but we need to call the function a data dependant number of times. Of course the compiler still believes the end condition is unreachable and can optimize it away. This results an infinite loop!

If you’re currently busy moaning about arrogant out-of-touch compiler writers who don’t understand the real world then please stop. Compiler writers tend to dog food (by compiling their compilers using their own compilers) and are just as in touch with the real world as every other working programmer. Perhaps instead you could thank them for transforming a rare, hard to tickle bug that could easily slip into production, into a deterministic one that should be caught by even the most simple of tests. That’s much easier to debug.

Of course the other thing you get out of the dedication of your compiler writer is faster code. In a system where there is heavy inlining of defensively written code (for example the C++ standard library) then, in principle, throwing away unreachable loop end conditions could pay significant dividends.

Share