one-second
one-second copied to clipboard
sum.c is erroneous in the presence of compiler optimization
There are two problems with sum.c as written: since the sum, s, is not used, any optimizing compiler will remove the loop entirely; and even if it is used, at least GCC and Clang are able to determine that the loop will result in s == NUMBER and remove it anyway.
It's hard to come up with a simple alternate solution because compilers are pretty tricky with these kinds of simple loops. One possibility is to make s volatile, but this will somewhat artificially constrain performance by making the CPU store each iteration to memory rather than keeping it in a register. Another is to add
asm("" :: "r"(s));
in the loop, to force the compiler to put the sum in some (single) register before executing an empty block of assembly.
I see two possible clean solutions:
- Move the summation into a separate function. This may or may not work, depending on whether gcc and clang are intelligent enough to determine whether a function is deterministic. I haven't investigated this.
- Print out the end value of
s. This is done infill_array.candfill_array_out_of_order.cas well, so the same should probably be done here.
Thanks for reporting this! This was originally tested with gcc -O2 on OS X and it didn't optimize out the loop, but it certainly doesn't work on my Linux machine.
On my machine (OSX 10.11) -O2 actually optimized it away:
$ gcc -O2 sum.c
$ time ./a.out 500000000
real 0m0.004s
user 0m0.001s
sys 0m0.002s
-O1 gives the same result. Only -O0 version isn't optimized in this case:
$ gcc -O0 sum.c
$ time ./a.out 500000000
real 0m1.079s
user 0m1.054s
sys 0m0.008s
$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix
I would suggest to use -O0 optimization level. We measure how fast is real loop, not optimized out one (which means no code at all).
radarek wrote:
I would suggest to use
-O0optimization level.
I disagree. At -O0, gcc tends to generate hugely inefficient code. Benchmarking that tells you nothing about the speed you can expect from a C program.
I suggest the following:
#include <stdlib.h>
#include <stdio.h>
// Number to guess: How many iterations of
// this loop can we go through in a second?
int main(int argc, char **argv) {
unsigned int NUMBER, i, s;
NUMBER = atoi(argv[1]);
for (s = i = 0; i < NUMBER; ++i) {
s += i;
}
printf("%u\n", s);
return 0;
}
Notice three changes:
- The value of
sis printed in order to ensure it will be computed s += 1is replaced bys += ibecause with the former gcc is able to figure out the result at compile time, which it doesn't with the later- the numbers are unsigned in order to avoid undefined behavior when
soverflows.