one-second icon indicating copy to clipboard operation
one-second copied to clipboard

sum.c is erroneous in the presence of compiler optimization

Open comex opened this issue 10 years ago • 4 comments
trafficstars

There are two problems with sum.c as written: since the sum, s, is not used, any optimizing compiler will remove the loop entirely; and even if it is used, at least GCC and Clang are able to determine that the loop will result in s == NUMBER and remove it anyway.

It's hard to come up with a simple alternate solution because compilers are pretty tricky with these kinds of simple loops. One possibility is to make s volatile, but this will somewhat artificially constrain performance by making the CPU store each iteration to memory rather than keeping it in a register. Another is to add

asm("" :: "r"(s));

in the loop, to force the compiler to put the sum in some (single) register before executing an empty block of assembly.

comex avatar Oct 25 '15 05:10 comex

I see two possible clean solutions:

  • Move the summation into a separate function. This may or may not work, depending on whether gcc and clang are intelligent enough to determine whether a function is deterministic. I haven't investigated this.
  • Print out the end value of s. This is done in fill_array.c and fill_array_out_of_order.c as well, so the same should probably be done here.

auscompgeek avatar Oct 25 '15 12:10 auscompgeek

Thanks for reporting this! This was originally tested with gcc -O2 on OS X and it didn't optimize out the loop, but it certainly doesn't work on my Linux machine.

jvns avatar Oct 25 '15 13:10 jvns

On my machine (OSX 10.11) -O2 actually optimized it away:

$ gcc -O2 sum.c
$ time ./a.out 500000000

real    0m0.004s
user    0m0.001s
sys 0m0.002s

-O1 gives the same result. Only -O0 version isn't optimized in this case:

$ gcc -O0 sum.c
$ time ./a.out 500000000

real    0m1.079s
user    0m1.054s
sys 0m0.008s
$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix

I would suggest to use -O0 optimization level. We measure how fast is real loop, not optimized out one (which means no code at all).

radarek avatar Nov 02 '15 13:11 radarek

radarek wrote:

I would suggest to use -O0 optimization level.

I disagree. At -O0, gcc tends to generate hugely inefficient code. Benchmarking that tells you nothing about the speed you can expect from a C program.

I suggest the following:

#include <stdlib.h>
#include <stdio.h>

// Number to guess: How many iterations of
// this loop can we go through in a second?

int main(int argc, char **argv) {
    unsigned int NUMBER, i, s;
    NUMBER = atoi(argv[1]);

    for (s = i = 0; i < NUMBER; ++i) {
        s += i;
    }
    printf("%u\n", s);

    return 0;
}

Notice three changes:

  • The value of s is printed in order to ensure it will be computed
  • s += 1 is replaced by s += i because with the former gcc is able to figure out the result at compile time, which it doesn't with the later
  • the numbers are unsigned in order to avoid undefined behavior when s overflows.

edgar-bonet avatar Feb 10 '19 11:02 edgar-bonet