tokio-uring icon indicating copy to clipboard operation
tokio-uring copied to clipboard

multi_thread support

Open Noah-Kennedy opened this issue 1 year ago • 5 comments

Tracking issue for support for multi_thread runtime.

Noah-Kennedy avatar Mar 06 '23 21:03 Noah-Kennedy

Is there any limitation in implementing this that doesn't let us just use the multithreaded equivalents of the currently used types in the library? (e.g., Rc => Arc, RefCell => Mutex, etc.)

Alonely0 avatar Mar 31 '23 19:03 Alonely0

@Alonely0 Its possible to configure io_uring such that it invalid for a submission to come from more than 1 thread. I suspect there are other things to consider also.

tokio_uring::builder().uring_builder(
   tokio_uring::uring_builder().setup_single_issuer()
)

ollie-etl avatar Apr 02 '23 09:04 ollie-etl

I think multi-thread support is essential

I use tokio to write a VPN program, when using rt-multi-threaded feature and tokio::spawn, in the test environment, iperf3 TCP can reach 600Mbps, however when using tokio_uring to handle IO, the single thread tokio_uring only have 120Mbps, and 1 CPU usage reach 100% on the iperf3 server

tokio image

tokio_uring image

I have set sqpoll to try to reduce the submit_and_wait syscall

Sherlock-Holo avatar May 25 '23 08:05 Sherlock-Holo

I notice the Op has a weak reference of the TLS RuntimeContext, and when polling the future, it will check the lifecycle in Ops, and the Ops is in the RuntimeContext

if Op can have an arc weak reference of the RuntimeContext, no matter which thread the RuntimeContext is in, that may make multi-thread support easily

  • when creating an Op, it uses the current thread RuntimeContext to submit the sqe
  • when polling the future, Op will use the arc weak reference to find out the RuntimeContext to check if work is done or not, no matter Op is on the same thread when submitting the sqe or not
  • we can create multi threads to run the io_uring instances, so we can use more CPU

Sherlock-Holo avatar May 25 '23 10:05 Sherlock-Holo

So, in general the easiest path to doing this is going to be to do something similar to what @Sherlock-Holo described, however it isn't clear to me that this will perform terribly well. Contention on the squeue may be an issue, so there would be a bit of "wait and see" with respects to what means of doing this ultimately stick.

For now I'd recommend a runtime-per-core model of some sort. Depending on what you are doing, that is probably going to work quite well.

Noah-Kennedy avatar May 25 '23 20:05 Noah-Kennedy