userver `engine::Yield` doesn't actually yield

Due to moodycamel::ConcurrentQueue implementation details engine::Yield doesn't actually yield running thread back to the thread-pool: since moodycamel consists of a set of thread-local FIFO queues and
every consumer checks for its paired producer queue first, when a coroutine-thread
calls engine::Yield task continuation is scheduled "for himself", thus when it tries to dequeue a task it will get the exact same continuation it just yielded (it's not always true and moodycamel does rotate consumer<->producer pairing from time to time).

Here is an example of what i'm talking about, and sadly this situation has some real-life analogues: Consider PostgreCache: it uses engine::Yield when copying a container or iterating fetched values, and that could take plenty of iterations, basically "blocking" current thread, even though it yields.

Imagine you have 2 cores, 2 worker-threads and 150RPS, where each request takes 10ms of CPU times: 2 threads should deal with it no problem, leaving 250ms of CPU time to spare, however as soon as one thread gets into "yield-loop" second thread gets overwhelmed instantly

What's even worse in aforementioned example, PostgreCache does itself blocks a worker-thread when it deallocates cached data, but that's another problem i guess

Jul 30 '22 14:07 itrofimow

So i took some time to proof the point:

Applied this patch to samples/hello_service and ran some http requests.

This is without background task, expected:

itrofimow@itrofimow-RedmiBook-16:~$ ~/Downloads/bombardier-linux-amd64 -c 10 -r 150 -d 10s -l http://localhost:8080/hello
Bombarding http://localhost:8080/hello for 10s using 10 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       150.06      36.49     234.15
  Latency       14.61ms     4.16ms    31.59ms
  Latency Distribution
     50%    11.37ms
     75%    20.69ms
     90%    20.82ms
     95%    20.92ms
     99%    31.21ms
  HTTP codes:
    1xx - 0, 2xx - 1503, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    56.25KB/s

This is with yielding background task (see the patch) and engine:Yield changed to void Yield() { SleepUntil(Deadline::FromDuration(std::chrono::milliseconds{1})); }, expected

itrofimow@itrofimow-RedmiBook-16:~$ ~/Downloads/bombardier-linux-amd64 -c 10 -r 150 -d 10s -l http://localhost:8080/hello
Bombarding http://localhost:8080/hello for 10s using 10 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       150.28      22.93     265.63
  Latency       16.18ms     4.45ms    35.97ms
  Latency Distribution
     50%    14.22ms
     75%    21.56ms
     90%    22.12ms
     95%    22.51ms
     99%    33.22ms
  HTTP codes:
    1xx - 0, 2xx - 1500, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    56.24KB/s
itrofimow@itrofimow-RedmiBook-16:~$

And this without changes to engine::Yield (just the patch applied to develop). Not expected

itrofimow@itrofimow-RedmiBook-16:~$ ~/Downloads/bombardier-linux-amd64 -c 10 -r 150 -d 10s -l http://localhost:8080/hello
Bombarding http://localhost:8080/hello for 10s using 10 connection(s)
[======================================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       102.44      37.49     370.26
  Latency       94.24ms    20.81ms   145.48ms
  Latency Distribution
     50%   103.22ms
     75%   103.67ms
     90%   104.07ms
     95%   104.68ms
     99%   134.61ms
  HTTP codes:
    1xx - 0, 2xx - 1034, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    38.40KB/s

Jul 30 '22 16:07 itrofimow

We've been hit by this problem not so long ago. The solution would be to provide a way to customize stealing/non-stealing scheduling policies for task processors. Right now we're somewhere in between those two options.

Thanks for the very good test! It'll help us a lot in a few months, when we start the scheduling customization feature

Jul 31 '22 06:07 apolukhin

It'll help us a lot in a few months, when we start the scheduling customization feature

That's nice to hear, glad to help)

However i think the problem with yield could be real as of now, and since this is really hard to debug and with a popularity of userver rising some people might get bitten by it - would you consider adding some ad-hoc fixes until improved scheduler lands?

Jul 31 '22 12:07 itrofimow

Good afternoon. What are the plans to correct the behavior and is there any workaround at the moment?

Jan 09 '23 15:01 Un1oR

Good afternoon. What are the plans to correct the behavior and is there any workaround at the moment?

Out of curiosity - does this affect you in a measurable way?

One way to mitigate this i'm aware of would be to decrease EXPLICIT_CONSUMER_CONSUMPTION_QUOTA_BEFORE_ROTATE in moodycamel traits, however this might affect overall performance.

I've heard there were talks about new scheduler or smth, @apolukhin any updates on that?

Jan 09 '23 15:01 itrofimow

We could have a separate second task queue for Yield-ed tasks. Low-priority work like background cache updates can also go there.

This work can go independently from investigating a non-thread-migrating scheduler.

Jan 24 '23 22:01 Anton3

userver userver copied to clipboard

`engine::Yield` doesn't actually yield

userver
userver copied to clipboard