thread-pool Investigate use of `std::atomic_flag` instead of `std::binary

std::atomic_flag is the only atomic primitive guaranteed to be lock free. It would be interesting to see if this has any positive impact on performance over std::binary_semaphore.

Jul 03 '23 16:07 DeveloperPaul123

Did a quick benchmark on quick-bench and it seems that std::binary_semaphore has the best performance when it comes to ping/pong which I think matches well with how we're using it in the thread pool (as a signal mechanism).

https://quick-bench.com/q/JkZjpTgsjQkSiyI20IRcZtEFNso

JkZjpTgsjQkSiyI20IRcZtEFNso

Jul 03 '23 17:07 DeveloperPaul123

I wrote a quick benchmark and ran it locally on windows and am getting inconsistent results. I think I will need to do a more proper test with pyperf on a Linux machine to get better numbers.

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	451.75	2.21	1.8%	53.95	`std::atomic_flag`
99.2%	455.40	2.20	2.0%	54.42	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	444.76	2.25	2.1%	53.20	`std::atomic_flag`
98.4%	452.16	2.21	3.0%	53.78	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	485.05	2.06	0.3%	57.51	`std::atomic_flag`
103.0%	470.99	2.12	0.8%	57.44	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	457.13	2.19	1.9%	55.43	`std::atomic_flag`
96.9%	471.90	2.12	3.8%	56.81	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	481.77	2.08	2.8%	56.86	`std::atomic_flag`
105.2%	457.91	2.18	4.1%	54.98	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	465.74	2.15	0.4%	55.85	`std::atomic_flag`
101.3%	459.84	2.17	0.6%	55.00	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	453.31	2.21	0.9%	54.72	`std::atomic_flag`
95.0%	477.07	2.10	2.4%	56.92	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	477.71	2.09	0.8%	57.59	`std::atomic_flag`
101.6%	470.13	2.13	0.6%	56.57	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	477.80	2.09	1.1%	57.12	`std::atomic_flag`
100.4%	475.79	2.10	1.8%	56.92	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	479.51	2.09	1.2%	58.10	`std::atomic_flag`
102.9%	465.78	2.15	4.3%	55.52	`std::binary_semaphore`

Jul 06 '23 13:07 DeveloperPaul123

here are 10 runs on a linux system with pyperf system tune set up:

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	406.13	2.46	1.1%	48.78	`std::atomic_flag`
93.8%	432.78	2.31	1.9%	52.17	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	409.78	2.44	0.7%	49.23	`std::atomic_flag`
106.2%	385.80	2.59	0.7%	46.45	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	402.90	2.48	0.7%	48.30	`std::atomic_flag`
101.1%	398.52	2.51	0.8%	47.77	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	397.33	2.52	0.4%	47.64	`std::atomic_flag`
104.8%	379.24	2.64	0.7%	45.69	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	372.06	2.69	0.9%	44.75	`std::atomic_flag`
88.7%	419.39	2.38	2.2%	50.05	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	420.61	2.38	0.9%	50.24	`std::atomic_flag`
112.4%	374.31	2.67	0.8%	45.01	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	394.11	2.54	0.9%	47.54	`std::atomic_flag`
97.8%	403.07	2.48	0.6%	48.64	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	406.72	2.46	0.7%	48.58	`std::atomic_flag`
105.2%	386.67	2.59	1.1%	46.27	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	409.09	2.44	0.3%	49.27	`std::atomic_flag`
107.2%	381.71	2.62	1.0%	45.74	`std::binary_semaphore`

relative	ms/op	op/s	err%	total	Thread signaling
100.0%	394.07	2.54	0.7%	47.26	`std::atomic_flag`
90.8%	434.10	2.30	0.6%	52.22	`std::binary_semaphore`

Jul 05 '24 17:07 jtd-formlabs

@jtd-formlabs Thanks for doing that. It seems like they're essentially the same. At most we'd be saving tens of milliseconds, so it seems like it's not worth it to me.

Jul 05 '24 17:07 DeveloperPaul123

Investigate use of `std::atomic_flag` instead of `std::binary_semaphore`