may icon indicating copy to clipboard operation
may copied to clipboard

Higher Latency compared to Tokio

Open rohitjoshi opened this issue 8 years ago • 11 comments

Thank you for creating this high-performance library and making async easier. Reading your blog, it seems average and peak latency seems to be higher compared to tokio based implementation.

Is this due to higher throughput or scheduling coroutine?

e.g.

Tokio threaded:

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    52.54us   56.39us  12.14ms   99.36%

vs

May coroutine:

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.26ms    6.48ms 105.13ms   98.35%

rohitjoshi avatar Jan 17 '18 21:01 rohitjoshi

Good question! It needs us investigate deeply. I have a blog that analysis why using hyper based porting is slow. those numbers are coming from such conditions: while tokio version using the latest master branch of hyper, may version using the old 0.10.x branch which is not very optimized. so the comparison is not that fairly positive.

I have a minihttp, that should be a fair comparison.

tokio_minihttp

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.98ms  284.06us  12.53ms   98.92%

vs may_minihttp

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.70ms  812.42us  20.17ms   97.94%

may is faster! But with bigger jitters.

Xudong-Huang avatar Jan 18 '18 01:01 Xudong-Huang

👍 looks good. I am planning to convert my existing threaded application to may coroutine-based. This is a near-real-time application with 99.9% latency below 1ms with max latency ~3ms. The application runs on 16 core machine but actively uses only 9 threads (1 thread per TCP connection + 1 cache updater thread). The number of TCP connections is fixed to 8. Let me know if you have any suggestions.

rohitjoshi avatar Jan 18 '18 02:01 rohitjoshi

Your application is very special, there are enough resources for every thread and only 8 connections running on 16 cores! I think you can try clone each stream into two part: one for reading and one for writing, so that read and write don't block each other. and spawn coroutines to handle each request. Don't forget to config the io_workers to a reasonable number, so that fully utilize the CPU.

Xudong-Huang avatar Jan 18 '18 02:01 Xudong-Huang

Is it possible to reduce jitter if we use a work stealing queue per worker thread instead of a global mpmc queue for coroutines?

alkis avatar Jan 18 '18 19:01 alkis

@alkis I don't thinks so. IO related tasks will not schedule from the mpmc channel (used by normal worker thread), they will run directly in the io_workers threads. I think the jitter comes from the fact that coroutine will not suspend if there is already some data on the io, say for the reading, the coroutine would only suspend when the unblocking read returns no data, this would cause some coroutines spin the cpu for a while until it drains out and consumes all of the available data, while other corouines have to wait for the cpu release.

Xudong-Huang avatar Jan 19 '18 02:01 Xudong-Huang

Is there some documentation on the tradeoffs and/or different designs for the scheduler? More particularly why do we need separate io_workers? How does it compare to N generic workers plus a single epoll_wait() worker whose only job is to handle the slow path of park/unpark coros that wait for I/O?

alkis avatar Jan 19 '18 09:01 alkis

well, at the beginning I do use unified thread to handle all of the events, but it's not as efficient as I expect. the sync primitive events and the timeout events would not go through the io thread, they are scheduled in normal worker thread. if generate event to the epoll system, this would have another system call which is not necessary. running coroutines in normal worker thread for the io event would have a noticeable delay after receiving the io event which generated in io worker thread.

Xudong-Huang avatar Jan 19 '18 14:01 Xudong-Huang

For the record, I wrote (with GuillaumeGomez) an FTP server with tokio and one with may and the version with may seems a little bit faster when benchmarking with this tool, so I think may is fast enough.

antoyo avatar Jan 20 '18 19:01 antoyo

Can you please share your results? Did you measure latency differences?

rohitjoshi avatar Jan 20 '18 22:01 rohitjoshi

I'll write a blog post with Guillaume (hopefully soon) to talk about our experience and show these results. This benchmark only shows the number of connection that remained at the desired speed, so I guess the latency was mesured somehow, but I only have the number of users. If you know a better tool to benchmark an FTP server, I'll be happy to try it. Thanks.

antoyo avatar Jan 20 '18 23:01 antoyo

Hello @Xudong-Huang, thanks for your great work!

I'm very interested in may and would like to know if you have more recent data on tokio vs may?

pedrozaalex avatar Mar 11 '23 12:03 pedrozaalex