cpluspluscourse icon indicating copy to clipboard operation
cpluspluscourse copied to clipboard

Atomic exercise DOES NOT race on my machine

Open chavid opened this issue 2 years ago • 6 comments

@hageboeck @sponce @bernhardmgruber

I suspect the -O2 compiling option (with g++ 11.3. ) make that for (int i = 0; i < 100; ++i) { a++; } is some howreplaced with a+=100 ... Quite simple to change in Makefile. How to do it in CMakeLists.txt ?

I also needed to move up the code to 10000 tries (instead of 1000), and the main function to 1000 increments (instead of 100) before I start to see racing conditions.

chavid avatar Oct 12 '22 09:10 chavid

I also barely see any races on WSL. Maybe we can increase some of the loop counts to make this happen more often.

Also: there is barely any time difference between time ./racing and time ./racing.sol. can we make the difference stronger?

time ./racing real 0m0.139s time ./racing.sol real 0m0.128s

bernhardmgruber avatar Oct 13 '22 14:10 bernhardmgruber

Concerning the race condition, I've added a godbolt link in the slide where we present the code, and there, we get plenty of weird numbers :-)

sponce avatar Oct 13 '22 14:10 sponce

I also barely see any races on WSL. Maybe we can increase some of the loop counts to make this happen more often.

Also: there is barely any time difference between time ./racing and time ./racing.sol. can we make the difference stronger?

time ./racing real 0m0.139s time ./racing.sol real 0m0.128s

With the numbers I suggest above (10000x1000 instead of 1000x100), I see a huge difference in execution time(x20). Basically, the a++ time becomes negligible as compared the mutex lock/unlock. Happily, the atomic version version is a lot better, as one can expect.

chavid avatar Oct 13 '22 16:10 chavid

I also barely see any races on WSL. Maybe we can increase some of the loop counts to make this happen more often. Also: there is barely any time difference between time ./racing and time ./racing.sol. can we make the difference stronger? time ./racing real 0m0.139s time ./racing.sol real 0m0.128s

With the numbers I suggest above (10000x1000 instead of 1000x100), I see a huge difference in execution time(x20). Basically, the a++ time becomes negligible as compared the mutex lock/unlock. Happily, the atomic version version is a lot better, as one can expect.

Shouldn't incorrectly written multithreaded code almost always outperform the the correct one ;) I guess the reason we see almost similar numbers for both atomic and non atomic code here is probably because most of the time is spent on the thread spawn/destroy rather than the increment. If I increase the inner loop of increment by a factor of 100, I can already see the 50x difference even with just std::atomic, though still hard to get the race condition reliably.

If I increase the threads to 4 and go for 1 million integers, I can almost get the violation every other time, however time wise for the atomic program this runs into more than a minute though (while the unsafe program still completes at .1s) ;)

theanalyst avatar Oct 13 '22 19:10 theanalyst

We could make the loops size som runtime arguments to the exetucable, and the first of the exercise may be for the student to move up the numbers on their machine, untile they clearly see the race conditions.

chavid avatar Oct 14 '22 06:10 chavid

In my opinion, it's very insightful to not see the race condition on first invocation. You might have seen in the lecture that I also only get 100 and 200 as answers, probably because it gets optimised into a += 100;, but that's precisely what happens in real life. After invoking it a few times, it does race, just like in real life ... Maybe a better alternative is to use thread sanitizer (I did that in my exercise class on Thursday) or helgrind, or start it many many times as I did in the lecture?

If all of those don't work for one reason or another, we can of course increase the numbers.

hageboeck avatar Oct 14 '22 12:10 hageboeck

After invoking it a few times, it does race, just like in real life ...

Yeah, that is the problem. If I call the example in a loop it only races 2 or 3 times per minute and you have to stare very focused on your screen :)

I would appreciate the race to either happen more often or add a step to the exercise that confirms the race (like thread sanitizer).

bernhardmgruber avatar Nov 02 '22 16:11 bernhardmgruber

+1 for tsan. The only issue is that it isn't available everywhere. In addition, we could do what I did in the lecture: We remove all output if it didn't race, and only print if it races, or use exit codes or similar. In that case:

  • People with tsan would just compile and run
  • People without have to call it in a loop until they see that race. We could give instructions on how to run it in a loop and how to react to non-zero exit codes.

hageboeck avatar Nov 02 '22 16:11 hageboeck

Yeah, that sounds good to me!

bernhardmgruber avatar Nov 02 '22 16:11 bernhardmgruber

OK, I will assign it to myself for now.

hageboeck avatar Nov 03 '22 09:11 hageboeck