liburing icon indicating copy to clipboard operation
liburing copied to clipboard

Investigate WSL performance?

Open LifeIsStrange opened this issue 2 years ago • 7 comments

I saw this comment showing worse then EPOLL performance. https://github.com/chriskohlhoff/asio/issues/401#issuecomment-1126778607

LifeIsStrange avatar Oct 14 '22 17:10 LifeIsStrange

I saw this comment showing worse then EPOLL performance. chriskohlhoff/asio#401 (comment)

Well, 5.10 is not interesting at all, lots has happened since then on the optimisation side. And performance will depend on how io_uring is used (operations types, file types, what features utilised, what is the userspace architecture, etc.). Does WSL has newer kernels?

isilence avatar Oct 14 '22 17:10 isilence

I don't think this is just a case of older kernels, my guess would be suboptimal code that results in io-wq usage.

axboe avatar Oct 14 '22 18:10 axboe

Right, e.g. at least in one of the tests they were using pipes (which are lacking good support by io_uring for various reasons), but the point is, it's not productive to even start discussing io_uring performance of a 2 year old kernel.

isilence avatar Oct 14 '22 18:10 isilence

don't know about windows status but git master branch shows linux 5.15 for WSL https://github.com/microsoft/WSL2-Linux-Kernel/commits/linux-msft-wsl-5.15.y

LifeIsStrange avatar Oct 14 '22 18:10 LifeIsStrange

Here are my results OS: Windows 10 21H2 CPU: AMD Ryzen 7 5800X WSL Linux 5.10 epoll 140,000 /sec uring 133,000 /sec WSL Linux 6.0.0 epoll 149,000 /sec uring 170,000 /sec

Testing: Each thread is pinned (1 server thread, 3 client threads) Each client thread is epoll backed with 2048 connections each (6144 total) No fixed files or buffers used for io_uring

Notes: It's worth mentioning that the 6.0.0 uring results are odd epoll maintains a consistent 149-151k /sec while uring averages around 167-173k /sec and sometimes peaks at 184k

On native linux 5.19, performance overall is a lot better at consistent 230-234k /sec for epoll, and consistent 236-242k /sec for io_uring, however the performance difference on Windows is much larger between the two

davidzeng0 avatar Oct 16 '22 04:10 davidzeng0

Another thing is, performance is not much worse by using IORING_OP_POLL_ADD and recv/send on the user side. If I were to guess, I would assume most of the gains are from the reduced system calls. When I was taking a quick look through the kernel code, io_uring poll events have to be task worked (and placed in a linked list) before the poll events are processed, while epoll seems to handle the event more directly (I believe, may be wrong). Despite this, uring seems to consistently perform equal to epoll or outperform by 1-5% on native linux 5.19 with: 1 client thread (iirc), 3 client threads, 1024 connections per thread, 2048 connections per thread, and using rust_echo_bench which is 1 thread per socket

davidzeng0 avatar Oct 16 '22 04:10 davidzeng0

In analysis of perf top, most (80% or higher) of the CPU time is dedicated to the send call on a socket, which explains why the reduced system call numbers only result in tiny gains.

davidzeng0 avatar Oct 16 '22 04:10 davidzeng0