Consider disabling LightweightSemaphore initial spinning by default in wait_dequeue_timed or make it configurable
LightweightSemaphore initial spinning adds a lot of CPU overhead on slow raspberryPI devices in a scenario like this:
consumer thread pseudocode:
BlockingReaderWriterQueue queue;
while(true) {
int next_wait_time = do_some_io_stuff() // returns ~1ms
QueueItem item;
if (queue.wait_deque_timed(item, next_wait_time)) { //returns false in most cases
do_some_additional_work_or_break_loop()
}
}
If next_wait_time is short this code will increase CPU load due to many initial spins. On std x86 platform this is not noticeable, but on very slow ARM devices it is better to wait on kernel semaphore than waste CPU cycles.
I was using an older version where number of spin iterations was 10000 instead of 1024. This caused additional ~20% CPU load on Orange Pi Zero when wait_deque_timed was called every 10ms.
Another reason for this slowdown is that this build uses compiler_fence which seems to be slow on this ARM device.
In my particular scenario, where there is a lot of execute_some_io() and queue is occasionally used I want to disable spinning as three is no additional benefit from it.
I think that a decision that I want to sacrifice additional CPU cycles to avoid a system call should be configurable per queue or per wait_deque_timed call . Ideally I would like to set number of initial spin iterations as this is highly dependent on workload type and hardware platform used.
Is it not obvious to library user that wait_deque_timed also adds a thousand extra iterations, too. I would expect it to block on system call by default.