perf: Re-use hot worker threads
This PR is a first small step towards auto-scaling worker threads. The main idea is to re-use 'hot threads'.
Workers are long running processes, they often persist objects or external connections across multiple requests. Dispatching requests Round-Robin style will sometimes cause the opening and closing of unnecessary connections.
This is easily optimized by dispatching to all threads in a specified order. This way we are mainly using a number of 'hot' worker threads for requests, while 'cold' threads can stay idle for potential latency spikes.
While it seems like this approach will cause additional overhead, it actually reduces overhead slightly (especially on a high number of threads).
Benchmarks
wrk -t4 -c150 -d60 http://localhost (done locally)
20 CPU cores 40 threads 'Hello world'
main branch: ~72000 RPS
this branch: ~75000 RPS
20 CPU cores 100 threads 'IO simuation'
main branch: ~28000 RPS
this branch: ~33000 RPS
The 'IO simulation' does this in addition to the 'Hello world':
// do 10_000 assignments
for($i=0; $i<10_000; $i++) {
$a = $i;
}
// sleep 1 millisecond
usleep(1000);
Not sure if this would still be called 'Round Robin'. We could also keep it as a 'dispatch_mode' configuration, like Swoole does. I'm not sure though if regular Round Robin would ever be preferred.
Let's keep only the best mode, unless there is a use case for others.
Was hoping for my actions to be back this month 😞 , guess it's a bug then
Thank you!!