How loop optimization in GCC uses undefined behaviour to make inferences

There are many ways to try and understand how an optimizing C compiler might transform your code.

Lets start by thinking about the goal of an optimizing compiler. Its goal is to manipulate the code that it is presented with, making modifications to cause it to execute more quickly and without altering the observable behaviour of a correctly formed program.

It is very important when describing the effects of an optimizer to remember that its only obligation it to preserve the behaviour of a correctly formed program. If your program is not correctly formed (normally expressed by compiler writers as relying upon undefined behaviour) then the optimizer is allowed to make modifications. That’s it! These modifications need not  preserve observable behaviour. They don’t have to execute more quickly. They don’t have to try to guess the programmer’s intent. The compiler is allowed to make modifications of more or less any form.

Over the years optimizing compilers have come to make inferences based on undefined behaviour. One of the simplest ones occurs when a pointer is dereferenced. In this case the compiler knows that the pointer cannot be NULL (because it has already been dereferenced and dereferencing a NULL pointer results in undefined behaviour). As a result it can the treat any later checks for pointer validity as unreachable code and remove it.

Recently GCC gained a much more powerful mechanism to track undefined behaviour that occurs due to out of range loop indicies. Having read about this and looked at a few spurious compiler bug reports I settled down and wrote the following plausible but buggy code.

Note: If you like puzzles then don’t scroll down past the return 0; statement and closing brace. That way you can try and spot the bug yourself. As far as I know there is only one in there!

int is_whitespace(char c)
{
    const char lookup_table[] = { ' ', '\t', '\n' };

    for (int i=0; i<=sizeof(lookup_table); i++)
        if (c == lookup_table[i])
            return 1;

    return 0;
}

Spotting the error is hopefully the easy part.

If you need a clue then be reassured that the lookup table contains three elements (it is initialized from character literals rather than a string literal so it does not get nil-terminated).

If you just want to know what the bug is so you can keep reading this article then have a look at the loop exit condition. It uses <= rather than < meaning the loop exit condition is reached after four cycles round the loop. This results in reading after the end of the array leading to undefined behaviour.

Now for the hard bit. What do you expect to happen when you run this code?

One answer, and one that I might have given had I been asked this question instead of writing it is: “strictly speaking this is undefined so it needs fixing, however the padding put in by the linker means that on most systems lookup_table[3] will evaluate to ‘\0’ and so the function will work more of less as intended (assuming the caller never cares whether the ‘\0’ character is whitespace or not)”.

That answer might even have been adequate two years ago. However with modern compilers it is better to rely on the simpler answer regarding what can happen: “anything”.

Nevertheless, even knowing all of the above the transformation the compiler actually makes may still yet surprise you. Unconditionally the code comes out as:

int is_whitespace(char c)
{
    return 1;
}

If you don’t believe me look at the assembler (generated using gcc-4.8.2 with -O2 -std=c99 -S on x86-64).

        .file   "g.c"
        .text
        .p2align 4,,15
        .globl  is_whitespace
        .type   is_whitespace, @function
is_whitespace:
.LFB0:
        .cfi_startproc
        movl    $1, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   is_whitespace, .-is_whitespace
        .ident  "GCC: (GNU) 4.8.2 20131212 (Red Hat 4.8.2-7)"
        .section        .note.GNU-stack,"",@progbits

What has happens is the compiler has realized that when i == 3 the code makes an undefined memory read. From this the compiler infers that i must always remain strictly less than three during execution of the function and that, for this reason, the end condition is unreachable. It can be optimized away and, because this change means the loop can never exit then we also know that the function can only return 1 so we can get rid of the loop itself as well.

As a further exercise imagine what happens if the loop contains a function call that is opaque to the optimizer (meaning that the compiler doesn’t know what side effects the function has and therefore cannot remove it). Now we still know that the end condition is unreachable but we need to call the function a data dependant number of times. Of course the compiler still believes the end condition is unreachable and can optimize it away. This results an infinite loop!

If you’re currently busy moaning about arrogant out-of-touch compiler writers who don’t understand the real world then please stop. Compiler writers tend to dog food (by compiling their compilers using their own compilers) and are just as in touch with the real world as every other working programmer. Perhaps instead you could thank them for transforming a rare, hard to tickle bug that could easily slip into production, into a deterministic one that should be caught by even the most simple of tests. That’s much easier to debug.

Of course the other thing you get out of the dedication of your compiler writer is faster code. In a system where there is heavy inlining of defensively written code (for example the C++ standard library) then, in principle, throwing away unreachable loop end conditions could pay significant dividends.

Share

Fifteen minutes using a Boss Micro BR

The fifteen minute figure is a slight poetic license. I did record this track during my first fifteen minutes playing about with the recorder. However after that it took me nearly twenty minutes to figure out how to master it into a format I can share with the world!

Recording is an Epiphone Dot plugged directly into a Boss Micro BR using the P01 “SuperCln” preset. The preset has been changed from the factory setting only by turning off the chorus.

To be honest the chorus effect sounds pretty terrific but, with such a naked guitar part it makes it sound like I have something to hide.

micro_br

Share

Use “#!/usr/bin/env hbcxx” to make C++ source code executable

#! C++I normally write some kind of personal toy during the holiday season. For example last year I wrote a toy fibre scheduler to go with a microcontroller project I was working on. This year however I’ve cooked up something and can’t quite decide if its a great idea, a pointless idea or a stupid idea. One thing is clear however, to find out which of the three possibilities it is, this bit of code needed packaging up properly as a product and shared with the wider world. Basically hbcxx uses the Unix #!/path/to/interpreter technique to make C++ source code directly executable.I’ve been taking a new look at C++. There is a palpable sense of “buzz” in the C++ community as they realize that, with C++11, they are sitting on something pretty special. The advocacy from the presenters at Going Native this year was remarkably effective (although if you take my advice you won’t watch Scott Meyer’s brilliant Effective C++14 Sampler until you know what std::move is for).
Quoting Bjarne Stroustrup: Surprisingly, C++11 feels like a new language. Considering its source it is not at all surprising that this quote is absolutely on the money: modern C++, meaning C++11 or later, does feel like another language. This is not because the language has been changed massively but because the new features encourage a different, and slightly higher level way to think about writing C++. It’s faster and more fun, supports lambdas, has tools to simplify memory management and includes regular expressions out-of-the-box.I was actually pretty amazed to see regular expressions in the standard C++ libraries, so that coupled with humane memory management (albeit humanity where you have to explicitly opt-in) and the auto keyword really got me thinking differently about writing C++. auto even encouraged me to write a template (generic programming is so much easier when you don’t have to explicitly declare the type of every expression). All this and without losing type safety…So my great/pointless/stupid idea (delete whichever is inappropriate) is a tool to keep things fast and fun by putting off the moment you have to write a build system and install script. For simple programs, especially for quick and dirty personal toys and scripts, the day you have to write a proper build system may never come. You no longer want the distraction of making a separate directory and a Makefile and you’ll find that pkg-config to just work.Instead I just copy your C++ source code into $HOME/bin. Try it. It works.Features include:

  • Automatically uses ccache to reduce program startup times (for build avoidance).
  • Enables -std=c++11 by defualt.
  • Parses #include directives to automatically discover and compile other source code files.
  • Recognises the inclusion of boost header files and, where needed automatically links the relevant boost library.
  • pkg-config integration.
  • Direct access to underlying compiler flags (-O3, -fsanitize=address, -g).
  • Honours the CXX environemnt variable to ensure clean integration with tools such as clang-analyzer’s scan-build.

To learn more about hbcxx take a look at:

Then have fun.

 

Share

How C++11, threads and lambda capture come together.

Somehow when I first read about C++ lambda functions I overlooked the way in which they capture variables. What can I say, I’m pretty familiar with python and just somehow expected variables to be automatically captured…

Be that as it may, I would like to present you with some broken code of the type I wrote when I first set out working on some of this stuff:

#include <atomic>
#include <iostream>
#include <thread>

using namespace std;

class threads_and_lambda_capture {
    thread worker;
    atomic<bool> running;

    void process()
    {
        while (running) {
            cout << "Process is running" << endl;
            this_thread::sleep_for(chrono::seconds{5});
        }
    }

public:
    threads_and_lambda_capture()
        : worker{}
        , running{true}
    {
    }

    virtual ~threads_and_lambda_capture()
    {
        // With thanks to Scott Meyers for making this topic the
        // subject of a presentation... if you forget to stop a
        // joinable thread then the threads destructor will
        // terminate the program.
        stop();
    }

    void start()
    {
        running = true;
        worker = thread{[]() { this->process(); }};  //  <== BUG!!!
    }

    void stop()
    {
        if (worker.joinable()) {
        running = false;
            worker.join();
        }
    }
};

Most of the class above is merely scaffolding to show you a little context. The focus of the rest of this post is exclusively the start() method:

void start()
{
    running = true;
    worker = thread{[]() { this->process(); }};  //  <== BUG!!!
}

The intent of the above code was to make a method belonging to the current class as the thread entry point. The bug in the above code is that the this pointer has not been captured by the lambda and is therefore undefined in the lambda’s body.

After a little time in the company of google I replaced my code with the following working code and moved on:

void start()
{
    running = true;
    worker = thread{&thread_and_lambda_capture::process, this};
}

Somehow this irked me slightly. For me, the real value of modern C++ (including C++11 and the soon to be released C++14) is that it takes down the level of expertise required to utilize its power.  For that reason the above just seemed too knowledge intensive to be modern C++.

Thankfully I was right! When I discovered, in a completely different context, more about lambda capture I revisited the code above (and decided to write a blog post about it).

void start()
{
    running = true;
    worker = std::thread{[this]() { this->process(); }};
    //                   ^^^^^^       ...and BUG is gone
}

To explain what’s happening here, lambdas in C++11 consist of the [] variable capture section, the optional () parameter list section and the {} body. With this added to the list of captured variables then the symbol in the lambda body resolves correctly. Note that I have captured this by value (making a copy of it) rather than by reference for the same reasons I would apply to method calls. As usual, objects that cannot (or should not) be copied are better captured by reference.

An short but information dense introduction to C++ lambdas can be found at: cppreference.com .

It is true that the above code is probably still not particularly pleasing to a novice (and I suspect it may also be slightly slower although I have not read the disassembly to check this). However the knowledge demonstrated is likely to be far more transferable to other problem domains than knowing the arcane behaviour resulting from taking the address of contains member functions. So much so that a novice tutored in modern way (meaning parts of STL are introduced on the first day) will probably already have come across the concept of variable capture long before multi-threading raises its head!

So… with my new found transferable knowledge I’m going to finish the lambda function I need to filter a list.

Share

Measuring a ’12 Nexus 7’s microphone to speaker latency

The recent release of patchfield for Android made me wonder whether Android’s audio system has developed sufficiently over the last three years to be suitable for real time signal processing. There have been a couple of developers at google working hard to reduce the output latency but their work does not yet encompass input latency.

However since patchfield allows the speaker to be digitally connected to the microphone it is easy to use patchfield itself to measure input to output latency of an Android system.

The following picture shows the test set up I used to measure it.

image

On the left is my 2012 Nexus 7. This is connected to the earphones sitting in top of the (red) sound card. Using earphones prevents feedback between the mic and speaker. Finally a microphone rests on top of the earphones. Critically the microphone picks up both the environment and the output of the earphones.

If you are particularly eagle eyed you might have notices a few missing cables in the above picture. Specifically the earphones are not actually plugged into the Nexus and the microphone is not plugged into the soundcard! I haven”t faked the photo however I only decided to blog about it after I had starting taking things apart again. In may haste to get a picture of it I forgot to plug it all back together.

Latency is measured simply by knocking in the table surface. This creates a short burst of noise that is captured both by the microphone in the picture and by the Nexus’ internal microphone. Patchfield captures the sound and replays it to the earphone which the microphone also picks up. The latency can then by calculated by looking at the waveform of the signals picked up by the microphone.

And the results are… 132ms from input to output. Just to put this number into perspective a piano, which is one of the highest latency musical instruments, takes about 25ms from the start of a keypress to the hammer striking a string. Another way to express it is that it is way too much for any musical application. The brain doesn’t just hear it as an echo… it hears it as a LOOONNNGG echo!

I repeated the same test on my Nexus 4 and that was better but not by enough to make much difference. That “only” took 124ms.

I guess there remains lots of very good reasons by iRig is not available for Android. There is one remaining ray of hope: the latest Samsung Mobile SDK includes a professional audio API based on jack and dedicated to handling low latency. It looks like Samsung really do want to take the fight to apple.

Share

Faking try/catch/finally in bourne shell (and jenkins)

When Bourne shell was first release in 1977 it turned out that, for several very good reasons, Steven Bourne had designed a nice simple language with no need for exception handling. That is, it did not need exception handling until Jenkins, also for very good reasons started using it with the -ex that causes the shell to bail out on the first error in encounters.

Normally Jenkins’ behaviour is is exactly what you need. Scripts stop as soon as something goes wrong. However a typical glue script to run a test suite overnight on a shared development board might look like the following pseudo-code:

lock $MY_DEVELOPMENT_BOARD
make test TARGET=$MY_DEVELOPMENT_BOARD
unlock $MY_DEVELOPMENT_BOARD

This problem with the above code is run using jenkins, or any other tool that runs the shell with the -ex argument, then the unlock command is not run if the test suite fails and make returns the error to us and the board is never unlocked. A simple fix might be:

lock $MY_DEVELOPMENT_BOARD
if make test TARGET=$MY_DEVELOPMENT_BOARD
then
    unlock $MY_DEVELOPMENT_BOARD
else
    unlock $MY_DEVELOPMENT_BOARD
    false
fi

If you favour compactness (and only having to type out the unlock command once) then perhaps:

lock $MY_DEVELOPMENT_BOARD
make test TARGET=$MY_DEVELOPMENT_BOARD && res=$? || res=$?
unlock $MY_DEVELOPMENT_BOARD
[ 0 -ne "$res" ] && false

However what about the following. Note that the “unlock” command is similar to a  a “finally” operation in some languages but will be executed before the catch statement:

lock $MY_DEVELOPMENT_BOARD
try make test TARGET=$MY_DEVELOPMENT_BOARD
unlock $MY_DEVELOPMENT_BOARD
catch echo "System tests failed! Please see logs"

The above can readily be implemented aided by a couple of simple shell functions:

try () {
        if [ -z $exception_has_been_thrown ]
        then
                "$@" || exception_has_been_thrown=1
        fi
}

catch () {
        if [ ! -z $exception_has_been_thrown ]
        then
                "$@"
                false   # If "sh -ex" then exit at this point
                unset exception_has_been_thrown
        fi
}

These scripts don’t make a big difference for the simple script above. However what if you are running multiple test suites sequentially under lock?

lock $MY_DEVELOPMENT_BOARD
try make smoke_test TARGET=$MY_DEVELOPMENT_BOARD
try make heavy_regression_test TARGET=$MY_DEVELOPMENT_BOARD
unlock $MY_DEVELOPMENT_BOARD
catch echo "System tests failed! Please see logs"

Now at last the benefits of these wrapper functions really make sense. Because all the test suites are run using the try function then they will be skipped if previous try blocks have reported an error. This gives us behaviour similar to -e but delays the reporting of the error until the unlock has been performed.

To close, and for the historians amoung you, despite my starting this post with a reference to 1977 the code presented wouldn’t have run in the original Bourne shell because it uses features that were added later. However I think that by 1986 (SVR3) then all the features used here would have been available. Certainly if you know different then please let me know… I’d be interested.

Share

The Encore project – Introduction

Recently I allowed myself to be given a very old beat up Encore strat copy. The idea is to have something that is provably worthless to mess about with as a confidence building re-finish project.

encore_guitar

Somebody has attempted to refinish this guitar before and failed in quite an extreme manner. The body, originally black, is still just as black but has a weird tar like coating on top of the slightly cracked original poly finish. It must have looked better before they started work, simple because I haven’t found much underneath that I wouldn’t have been happy to address with nail varnish and T-cut. In the picture below that big rectangular patch on the side is some of that charming tar. From the shape I can only assume the new paint reacted badly with the glue of some stickers.

encore_body

The scratch plate is almost worse. It looks like it is coated in tipex…

encore_pickups

This last pictures is a close up of the controls that shows the edge of the scratch plate nicely. Its actually a three ply scratch plate but you can hardly see the black middle ply in places.

encore_controls

Oddly the neck is actually in a pretty good state. Not only that but I was actually pretty stunned by its build quality (let’s just say I was not expected great things from this guitar). The wood of the fretboard is a bit cheap (and not stained enough) but the main wood of the neck is sufficiently pretty and the frets are really quite well finished. Its a fairly nice profile too so when I can get over my prejudices it doesn’t feel that bad to play.

The body? Not so much, they’ve put the money where is shows and put the MDF were it doesn’t! I had at least hoped for plywood (like my old Epiphone SG) so I could play about with transparent finishes. Not to be, so it goes…

Anyhow, ideas for how to refinish would be cool. I’ve more or less given up on the idea of a trans finish. I did briefly toy with doing an “industrial” finish of incompletely sanded back black poly next to very dark stained MDF under a tinted transparent finish. I might learn something doing that but I suspect what I would learn is not to put half sanded MDF under a transparent finish…

Share

The Egmond Project – Update #2

The next part of the project turned out to be getting everything lined up. After rebuilding the guitar I left it at pitch for a week or two to see if anything moved… it didn’t. Since the last post I have also moved the bridge over a bit to improved the line up of the strings. Unfortunately with the bridge in the right place the bass string started popping out if anyone breathed too loudly so I took the bridge out to cut a much deeper saddle slot for the bass strings.

Note: This post is part of a series: [Update #1] -> [Original post]

Even with these changes it still lacked quite a bit in the playability department. After several hours trying to shave the bridge I concluded that the neck joint really did need the neck to lay back a bit more. The Egmond’s neck joint it rather different from the Fender-style neck joint you might see on electric guitars. Instead of the four big screws it relies on a single bolt to counter the tension of the strings.

The Egmond cantilever neck joint
The Egmond cantilever neck joint

Basically the neck pivots on the plywood at the top of the neck joint, whilst the single bolt goes through the heel of the neck, through the spring and into the body. Adding some shims to the pivot point allows the neck angle to be increases slightly, dropping the strings down a bit and making the guitar more playable.

Neck joint with shims inserted
Neck joint with shims inserted

The shims were made by setting a plane to a fairly brutal setting and running it up and down a piece of beech. Sadly they were not quite thick enough, even when doubled up. The folks over at The Guitar Grounds were very helpful. After they described how the 70s era Fender’s were generally shimmed using whatever was at hand (often business cards). I figured if business cards were good enough to US made Fenders then I could get away with four layers of wooden shims. Actually I have three shims on one side and four on the other so correct the angle slightly.

It’s done the job. The neck is now laid back enough that the little metal wheels on the bridge can adjust the action properly. I’ve been forced to set the action a little higher than I would like because the frets could really do with the attention of a crowning file (which I don’t have). Nevertheless it makes the world of different to its playability.

Electrification will come soon!

Share

Crafting indentation

The code in this blog post wasn’t written by me. The code actually comes from an automatic code generator used by the wayland project (and is credited to Kristian Høgsberg). To me, the code displays simple craftsmanship and should be appreciated for something beyond it’s mere utility.

The code itself deserved to be read.

So here it is:

static const char *indent(int n)
{
	const char *whitespace[] = {
		"\t\t\t\t\t\t\t\t\t\t\t\t",
		"\t\t\t\t\t\t\t\t\t\t\t\t ",
		"\t\t\t\t\t\t\t\t\t\t\t\t  ",
		"\t\t\t\t\t\t\t\t\t\t\t\t   ",
		"\t\t\t\t\t\t\t\t\t\t\t\t    ",
		"\t\t\t\t\t\t\t\t\t\t\t\t     ",
		"\t\t\t\t\t\t\t\t\t\t\t\t      ",
		"\t\t\t\t\t\t\t\t\t\t\t\t       "
	};

	return whitespace[n % 8] + 12 - n / 8;
}

[From http://cgit.freedesktop.org/wayland/wayland/tree/src/scanner.c ]

I can even forgive if for neglecting to assert() its preconditions!

Share