Tmds.LinuxAsync icon indicating copy to clipboard operation
Tmds.LinuxAsync copied to clipboard

ASP.NET Schedulers

Open tmds opened this issue 5 years ago • 16 comments
trafficstars

@davidfowl @halter73 we've added control over the schedulers that get used.

We can control these:

  • c: where do async socket continuations run: inline/threadpool
  • i: input pipe reader scheduler: inline/threadpool
  • o: output pipe reader scheduler: inline/threadpool/ioqueue/iothread

The bold values are the default.

When we inline all these scheduler and use ProcessorCount threads, we measured 1m rps for middleware JSON on Citrine.

I wonder which of these would be safe to inline in a production app, where there may be potentially blocking code from the user?

I guess this would be c (because the application code is decoupled to ThreadPool later) and also o (because the application code runs on the output pipe writer scheduler). Is that correct? What about i?

cc @antonfirsov @adamsitnik

tmds avatar Apr 02 '20 21:04 tmds

For c, we've found inline scheduling is best on linux with Kestrel's Socket transport, so that's already the default there. See https://github.com/aspnet/KestrelHttpServer/issues/2573.

For i (and the output pipe write scheduler) we've always used the threadpool as a safe default in case the app developer blocks. This used to be configurable via KestrelServerOptions.ApplicationSchedulingMode, but that option got removed when pipe creation was made a transport responsibility. Generally this did result in faster benchmark runs, but we have not added similar transport options yet.

For o, it would certainly be safe to be inline (we basically do this on a per-HTTP/2-stream in Http2OutputProducer in Kestrel), but the benchmarking I've done before with Kestrel's Socket transport has shown the ioqueue to be slightly faster.

halter73 avatar Apr 03 '20 02:04 halter73

For c, we've found inline scheduling is best on linux with Kestrel's Socket transport, so that's already the default there

This c is a different one. It controls whether the Socket continuations are run on threadpool or stay on epoll thread. I assume it is safe to inline, because i defers to ThreadPool later.

For i (and the output pipe write scheduler) we've always used the threadpool as a safe default in case the app developer blocks.

ok, I will consider it unsafe to use inline here.

There is another use of ThreadPool not mentioned yet. The Handler gets put on ThreadPool by KestrelConnection class (see stacktrace below of JSON middleware). Though I assume that doesn't make it safe to set i=inline? Do you recall the rationale for running the handler on ThreadPool? Is this also happening for JSON platform benchmark?

For o, it would certainly be safe to be inline (we basically do this on a per-HTTP/2-stream in Http2OutputProducer in Kestrel), but the benchmarking I've done before with Kestrel's Socket transport has shown the ioqueue to be slightly faster.

Yes, benchmarks we ran showed the same thing, though it's hard to understand why.

   at web.JsonMiddleware.Invoke(HttpContext httpContext) in /home/tmds/repos/Tmds.LinuxAsync/test/web/JsonMiddleware.cs:line 20
   at web.PlaintextMiddleware.Invoke(HttpContext httpContext) in /home/tmds/repos/Tmds.LinuxAsync/test/web/PlaintextMiddleware.cs:line 25
   at Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Hosting.HostingApplication.ProcessRequestAsync(Context context)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnectionMiddleware`1.OnConnectionAsync(ConnectionContext connectionContext)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.ConnectionDispatcher.<>c__DisplayClass8_0.<StartAcceptingConnectionsCore>b__1(ConnectionContext c)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync()

tmds avatar Apr 03 '20 10:04 tmds

Though I assume that doesn't make it safe to set i=inline?

I figured out this isn't safe. The stacktrace changes for successive requests on the same connection. I still have these questions:

Do you recall the rationale for running the handler on ThreadPool? Is this also happening for JSON platform benchmark?

tmds avatar Apr 03 '20 11:04 tmds

Do you recall the rationale for running the handler on ThreadPool?

I think I got this too. If there is already a request in the input pipe the request handling would start, and stops Kestrel from accepting more connections until the handling goes async. It reduces kestrels accept connections per second performance.

Is this also happening for JSON platform benchmark?

I guess so. It doesn't matter much for the TE benchmarks, it only affects the first request from a connection.

tmds avatar Apr 03 '20 12:04 tmds

Adding some graphs from benchmarks.

Using threadpool

i=threadpool
c=threadpool

tp2

(x-axis is nr of epoll threads from 0 to ProcessorCount, I cut off the values because excel webview renders them wrong in chrome)

Since epoll threads don't do much work, we don't need many of them.

Here IoQueue has better performance than Inline. This raises the question: could ThreadPool improve so Inline performs as good as IoQueue?

Using epoll threads

i=inline
c=inline

in2

Since epoll threads do much work, we need a lot of them.

Here IoQueue reduces worse than Inline.

Other

@antonfirsov and I discussed running benchmarks also for this combination:

c=inline
i=threadpool

It would be meaningful also in light of previous discussion because this could be used safely by Kestrel even when user may have blocking handlers.

tmds avatar Apr 03 '20 12:04 tmds

That would be an ideal option IMO for kestrel and it should be opt-in at the socket level

davidfowl avatar Apr 03 '20 13:04 davidfowl

It looks like you answered all of your own questions @tmds 😄.

I agree an option to enable the inline scheduling of app code (i.e. ApplicationSchedulingMode) is something we should add back as a SocketTransportOption. I'll look into this.

halter73 avatar Apr 03 '20 18:04 halter73

This c is a different one. It controls whether the Socket continuations are run on threadpool or stay on epoll thread.

That makes sense. IIRC, the reason we run Socket.ReceiveAsync continuations inline on linux, but not Windows, is because on linux the continuation is already dispatched to a threadpool thread, whereas on Windows we're still on an I/O thread.

I assume it is safe to inline, because i defers to ThreadPool later.

Agreed. As long as we're not calling into non-framework code it should be safe to inline.

halter73 avatar Apr 03 '20 18:04 halter73

As long as we're not calling into non-framework code it should be safe to inline.

@halter73 Is combination of c=inline, i=threadpool safe to use with potentially blocking application code? I think so, since i=threadpool will put that code on the threadpool. Is that correct?

tmds avatar Apr 03 '20 21:04 tmds

Yes. Running continuations inline is safe if all they do is write to a PipeWriter and the reader scheduler is set to the threadpool.

halter73 avatar Apr 03 '20 21:04 halter73

@tmds @halter73: I did a comparison between the configurations discussed here.

image

Explanation of labels:

Optimal c=inline c=threadpool
c=inline i=inline a=true o=iothread c=inline i=threadpool a=true o=iothread c=threadpool i=threadpool a=true o=iothread

Note: I did run this on asp-citrine-lin, while last weekend's benchmarks were run on asp-citrine-db which has 5.5 kernel.

antonfirsov avatar Apr 06 '20 16:04 antonfirsov

Combining with results from https://github.com/tmds/Tmds.LinuxAsync/issues/78#issuecomment-610658950, and taking best results for (c,i) combinations:

  Config 1 Config 2 Config 3 Config 4
c threadpool inline inline threadpool
i threadpool inline threadpool inline
t 1 ProcessorCount ProcessorCount/2 1
o ioqueue iothread/inline ioqueue/inline ioqueue/inline
rps, AIO 900k 1026k/1000k 900k/860k 870k/840k
rps, no AIO 887k 978k/861k
app safe yes no yes yes

Config 1 is most similar to the current behavior. To do better (900k++), we need to move to config 2, which is not suitable for running user-code that may block the epoll threads.

Some thoughts about o:

  • iothread requires the pipe to be able to use the epoll thread as a Scheduler. Personally, I find the API addition to Socket to get a PipeScheduler not worth the gain we've measured here (+2,6%).
  • ioqueue. I wish we could get rid of ioqueue by making ThreadPool be smarter about this workload, but I don't know if this is feasible.
  • inline. This is my preferred mode because it means doing the send as soon as possible. Any deferring (threadpool, ioqueue, iothread) is going to add some delay. There may be issues with Pipelines due to https://github.com/dotnet/runtime/issues/25059. (And it may not be the best for TE benchmarks.)

edit: added rps, no AIO row

tmds avatar Apr 08 '20 04:04 tmds

To do better (900k++), we need to move to config 2

The gain we get in config 2 is thanks to using AIO. With o=inline and AIO disabled, we get 860k.

tmds avatar Apr 08 '20 04:04 tmds

I wish we could get rid of ioqueue by making ThreadPool be smarter about this workload, but I don't know if this is feasible.

@kouvel might be able to answer this question

adamsitnik avatar Apr 08 '20 07:04 adamsitnik

I wish we could get rid of ioqueue by making ThreadPool be smarter about this workload, but I don't know if this is feasible.

Yes I think that would be feasible and would be a better option than having another pool of threads, but it may have some tradeoffs, would have to try it out.

kouvel avatar Apr 09 '20 17:04 kouvel

I misunderstood and thought there was another pool of threads. I did some experiments relevant to using the thread pool directly instead of the IOQueue, so far I have not seen the thread pool performing better. It seems there is benefit from batching and maybe other things. I have posted some more info in our Teams channel. Based on the info I have so far it does not appear to be feasible to remove the IOQueue by using the thread pool instead.

kouvel avatar Apr 14 '20 23:04 kouvel