cpluspluscourse
cpluspluscourse copied to clipboard
Atomic exercise DOES NOT race on my machine
@hageboeck @sponce @bernhardmgruber
I suspect the -O2
compiling option (with g++ 11.3. ) make that for (int i = 0; i < 100; ++i) { a++; }
is some howreplaced with a+=100
... Quite simple to change in Makefile
. How to do it in CMakeLists.txt
?
I also needed to move up the code to 10000 tries (instead of 1000), and the main function to 1000 increments (instead of 100) before I start to see racing conditions.
I also barely see any races on WSL. Maybe we can increase some of the loop counts to make this happen more often.
Also: there is barely any time difference between time ./racing and time ./racing.sol. can we make the difference stronger?
time ./racing real 0m0.139s time ./racing.sol real 0m0.128s
Concerning the race condition, I've added a godbolt link in the slide where we present the code, and there, we get plenty of weird numbers :-)
I also barely see any races on WSL. Maybe we can increase some of the loop counts to make this happen more often.
Also: there is barely any time difference between time ./racing and time ./racing.sol. can we make the difference stronger?
time ./racing real 0m0.139s time ./racing.sol real 0m0.128s
With the numbers I suggest above (10000x1000 instead of 1000x100), I see a huge difference in execution time(x20). Basically, the a++ time becomes negligible as compared the mutex lock/unlock. Happily, the atomic version version is a lot better, as one can expect.
I also barely see any races on WSL. Maybe we can increase some of the loop counts to make this happen more often. Also: there is barely any time difference between time ./racing and time ./racing.sol. can we make the difference stronger? time ./racing real 0m0.139s time ./racing.sol real 0m0.128s
With the numbers I suggest above (10000x1000 instead of 1000x100), I see a huge difference in execution time(x20). Basically, the a++ time becomes negligible as compared the mutex lock/unlock. Happily, the atomic version version is a lot better, as one can expect.
Shouldn't incorrectly written multithreaded code almost always outperform the the correct one ;) I guess the reason we see almost similar numbers for both atomic and non atomic code here is probably because most of the time is spent on the thread spawn/destroy rather than the increment. If I increase the inner loop of increment by a factor of 100, I can already see the 50x difference even with just std::atomic, though still hard to get the race condition reliably.
If I increase the threads to 4 and go for 1 million integers, I can almost get the violation every other time, however time wise for the atomic program this runs into more than a minute though (while the unsafe program still completes at .1s) ;)
We could make the loops size som runtime arguments to the exetucable, and the first of the exercise may be for the student to move up the numbers on their machine, untile they clearly see the race conditions.
In my opinion, it's very insightful to not see the race condition on first invocation. You might have seen in the lecture that I also only get 100 and 200 as answers, probably because it gets optimised into a += 100;
, but that's precisely what happens in real life. After invoking it a few times, it does race, just like in real life ...
Maybe a better alternative is to use thread sanitizer (I did that in my exercise class on Thursday) or helgrind, or start it many many times as I did in the lecture?
If all of those don't work for one reason or another, we can of course increase the numbers.
After invoking it a few times, it does race, just like in real life ...
Yeah, that is the problem. If I call the example in a loop it only races 2 or 3 times per minute and you have to stare very focused on your screen :)
I would appreciate the race to either happen more often or add a step to the exercise that confirms the race (like thread sanitizer).
+1 for tsan. The only issue is that it isn't available everywhere. In addition, we could do what I did in the lecture: We remove all output if it didn't race, and only print if it races, or use exit codes or similar. In that case:
- People with tsan would just compile and run
- People without have to call it in a loop until they see that race. We could give instructions on how to run it in a loop and how to react to non-zero exit codes.
Yeah, that sounds good to me!
OK, I will assign it to myself for now.