one-second sum.c is erroneous in the presence of compiler optimization

trafficstars

There are two problems with sum.c as written: since the sum, s, is not used, any optimizing compiler will remove the loop entirely; and even if it is used, at least GCC and Clang are able to determine that the loop will result in s == NUMBER and remove it anyway.

It's hard to come up with a simple alternate solution because compilers are pretty tricky with these kinds of simple loops. One possibility is to make s volatile, but this will somewhat artificially constrain performance by making the CPU store each iteration to memory rather than keeping it in a register. Another is to add

asm("" :: "r"(s));

in the loop, to force the compiler to put the sum in some (single) register before executing an empty block of assembly.

Oct 25 '15 05:10 comex

I see two possible clean solutions:

Move the summation into a separate function. This may or may not work, depending on whether gcc and clang are intelligent enough to determine whether a function is deterministic. I haven't investigated this.
Print out the end value of s. This is done in fill_array.c and fill_array_out_of_order.c as well, so the same should probably be done here.

Oct 25 '15 12:10 auscompgeek

Thanks for reporting this! This was originally tested with gcc -O2 on OS X and it didn't optimize out the loop, but it certainly doesn't work on my Linux machine.

Oct 25 '15 13:10 jvns

On my machine (OSX 10.11) -O2 actually optimized it away:

$ gcc -O2 sum.c
$ time ./a.out 500000000

real    0m0.004s
user    0m0.001s
sys 0m0.002s

-O1 gives the same result. Only -O0 version isn't optimized in this case:

$ gcc -O0 sum.c
$ time ./a.out 500000000

real    0m1.079s
user    0m1.054s
sys 0m0.008s

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix

I would suggest to use -O0 optimization level. We measure how fast is real loop, not optimized out one (which means no code at all).

Nov 02 '15 13:11 radarek

radarek wrote:

I would suggest to use -O0 optimization level.

I disagree. At -O0, gcc tends to generate hugely inefficient code. Benchmarking that tells you nothing about the speed you can expect from a C program.

I suggest the following:

#include <stdlib.h>
#include <stdio.h>

// Number to guess: How many iterations of
// this loop can we go through in a second?

int main(int argc, char **argv) {
    unsigned int NUMBER, i, s;
    NUMBER = atoi(argv[1]);

    for (s = i = 0; i < NUMBER; ++i) {
        s += i;
    }
    printf("%u\n", s);

    return 0;
}

Notice three changes:

The value of s is printed in order to ensure it will be computed
s += 1 is replaced by s += i because with the former gcc is able to figure out the result at compile time, which it doesn't with the later
the numbers are unsigned in order to avoid undefined behavior when s overflows.

Feb 10 '19 11:02 edgar-bonet

one-second one-second copied to clipboard

sum.c is erroneous in the presence of compiler optimization

one-second
one-second copied to clipboard