peloton
peloton copied to clipboard
Performance fix: replace WorkerPool sleeping with condition variable
I couldn't stare at concurrency control logic anymore this afternoon, so I threw together a quick experiment replacing our WorkerPool's sleep with exponential backoff behavior for C++11's std::condition_variable. This change applies to any WorkerPool's created by a MonoQueuePool, which is currently the query worker threads, the parallel worker threads for codegen, and Brain worker threads.
I benchmarked using the same configs from #1401:
TPC-C: 15 runs, 60 seconds each, 4 terminals, scale factor 4 YCSB (read-only): 15 runs, 60 seconds each, 4 terminals, scale factor 1000
master μ | master σ | branch μ | branch σ | |
---|---|---|---|---|
TPC-C | 329 | 22 | 344 | 11 |
YCSB | 16580 | 167 | 18186 | 56 |
It seems 5-10% faster in limited testing.
Right now I'm interested in feedback about if this has already been explored, and any concerns others might have with using this approach. I'm mostly interested in scalability, and would like to see how this fares on something with multiple sockets and a lot of cores. I'm also wondering if this actually falls apart when we do get our TPS numbers up where they should be.
Coverage increased (+0.005%) to 76.371% when pulling 64567a5fcf9afe679c29927f05091e42b9ad429c on mbutrovich:worker_queue into 2406b763b91d9cee2a5d9f4dee01c761b476cef6 on cmu-db:master.
Interesting. The reason why we did not use cv is that it requires mutex, which may result in poor performance during high throughput. I would argue that we hold this change for a while and see if we may get a higher throughput after we reach a higher TPS.
We can hold this until we perform additional measurements. There are few instances where sleep is the right solution. It is usually chosen for its simplicity. If mutex overhead is a concern, or measured to be a concern, then we should look at minimizing use of the mutex for instance. Event driven is the right choice, with suitable optimization as needed.