userver
userver copied to clipboard
`engine::Yield` doesn't actually yield
Due to moodycamel::ConcurrentQueue
implementation details engine::Yield
doesn't actually yield running thread back to the thread-pool:
since moodycamel consists of a set of thread-local FIFO queues and
every consumer checks for its paired producer queue first, when a coroutine-thread
calls engine::Yield
task continuation is scheduled "for himself",
thus when it tries to dequeue a task it will get the exact same continuation it just yielded
(it's not always true and moodycamel does rotate consumer<->producer pairing from time to time).
Here is an example of what i'm talking about, and sadly this situation has some real-life analogues:
Consider PostgreCache
: it uses engine::Yield
when copying a container or iterating fetched values, and that could take plenty of iterations, basically "blocking" current thread, even though it yields.
Imagine you have 2 cores, 2 worker-threads and 150RPS, where each request takes 10ms of CPU times: 2 threads should deal with it no problem, leaving 250ms of CPU time to spare, however as soon as one thread gets into "yield-loop" second thread gets overwhelmed instantly
What's even worse in aforementioned example,
PostgreCache
does itself blocks a worker-thread when it deallocates cached data, but that's another problem i guess
So i took some time to proof the point:
Applied this patch to samples/hello_service and ran some http requests.
- This is without background task, expected:
itrofimow@itrofimow-RedmiBook-16:~$ ~/Downloads/bombardier-linux-amd64 -c 10 -r 150 -d 10s -l http://localhost:8080/hello
Bombarding http://localhost:8080/hello for 10s using 10 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics Avg Stdev Max
Reqs/sec 150.06 36.49 234.15
Latency 14.61ms 4.16ms 31.59ms
Latency Distribution
50% 11.37ms
75% 20.69ms
90% 20.82ms
95% 20.92ms
99% 31.21ms
HTTP codes:
1xx - 0, 2xx - 1503, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 56.25KB/s
- This is with yielding background task (see the patch) and
engine:Yield
changed tovoid Yield() { SleepUntil(Deadline::FromDuration(std::chrono::milliseconds{1})); }
, expected
itrofimow@itrofimow-RedmiBook-16:~$ ~/Downloads/bombardier-linux-amd64 -c 10 -r 150 -d 10s -l http://localhost:8080/hello
Bombarding http://localhost:8080/hello for 10s using 10 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics Avg Stdev Max
Reqs/sec 150.28 22.93 265.63
Latency 16.18ms 4.45ms 35.97ms
Latency Distribution
50% 14.22ms
75% 21.56ms
90% 22.12ms
95% 22.51ms
99% 33.22ms
HTTP codes:
1xx - 0, 2xx - 1500, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 56.24KB/s
itrofimow@itrofimow-RedmiBook-16:~$
- And this without changes to
engine::Yield
(just the patch applied to develop). Not expected
itrofimow@itrofimow-RedmiBook-16:~$ ~/Downloads/bombardier-linux-amd64 -c 10 -r 150 -d 10s -l http://localhost:8080/hello
Bombarding http://localhost:8080/hello for 10s using 10 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics Avg Stdev Max
Reqs/sec 102.44 37.49 370.26
Latency 94.24ms 20.81ms 145.48ms
Latency Distribution
50% 103.22ms
75% 103.67ms
90% 104.07ms
95% 104.68ms
99% 134.61ms
HTTP codes:
1xx - 0, 2xx - 1034, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 38.40KB/s
We've been hit by this problem not so long ago. The solution would be to provide a way to customize stealing/non-stealing scheduling policies for task processors. Right now we're somewhere in between those two options.
Thanks for the very good test! It'll help us a lot in a few months, when we start the scheduling customization feature
It'll help us a lot in a few months, when we start the scheduling customization feature
That's nice to hear, glad to help)
However i think the problem with yield
could be real as of now, and since this is really hard to debug and with a popularity of userver rising some people might get bitten by it - would you consider adding some ad-hoc fixes until improved scheduler lands?
Good afternoon. What are the plans to correct the behavior and is there any workaround at the moment?
Good afternoon. What are the plans to correct the behavior and is there any workaround at the moment?
Out of curiosity - does this affect you in a measurable way?
One way to mitigate this i'm aware of would be to decrease EXPLICIT_CONSUMER_CONSUMPTION_QUOTA_BEFORE_ROTATE
in moodycamel traits, however this might affect overall performance.
I've heard there were talks about new scheduler or smth, @apolukhin any updates on that?
We could have a separate second task queue for Yield
-ed tasks. Low-priority work like background cache updates can also go there.
This work can go independently from investigating a non-thread-migrating scheduler.