tokio-uring icon indicating copy to clipboard operation
tokio-uring copied to clipboard

Add support for IORING_OP_URING_CMD

Open thomasbarrett opened this issue 2 years ago • 23 comments

This PR adds optional support for the following IO_URING features:

  1. SQE128 128-byte submission queue entries were introduced in Linux 5.19. This PR enables 128-byte submission queue entries when the squeue-entry128 feature is enabled.

  2. CQE32 32-byte completion queue entries were introduced in Linux 5.19. This PR enables 32-byte completion queue entries when the cqueue-entry32 feature is enabled.

  3. IORING_OP_URING_CMD The uring-cmd command introduced in Linux 5.19 allows device specific commands to be passed through io_uring. There are a variety of devices that support and / or requireuring-cmd including, but not limited to, /dev/null, nvme devices, ublk devices. This PR enables the uring-cmd operation to be used when the uring-cmd feature is enabled.

Testing

The Ubuntu Github Action runner is not running a new enough version of Linux to test the squeue-entry128, cqueue-entry32, or uring-cmd features. These can be tested on a sufficiently recent version of Linux (I used 6.1). Each of the 3 features can be enabled independently.

cargo test --features sqe128
cargo test --features cqe32
cargo test --features uring-cmd
cargo test --features sqe128,uring-cmd

thomasbarrett avatar Dec 22 '22 04:12 thomasbarrett

I have been testing this on a Linux 6.1 QEMU VM on the “/dev/ublk-control” device created by the ublk module.

in general, IORING_OP_URING_CMD is used for device specific operations, very similar to ioctl. In addition to ublk, I believe that NVMe devices and /dev/null also support IORING_OP_URING_CMD. I will add some tests using /dev/null since it doesn’t require any specific devices.

thomasbarrett avatar Dec 23 '22 03:12 thomasbarrett

I have a VM with an NVMe root partition but I don't know if that can be used for these URING_CMD operations. I'll have to read some more soon.

FrankReh avatar Jan 13 '23 03:01 FrankReh

Do the feature flags mean one can't have both forms of sqe and cqe in the same app? I always thought we would end up with the main driver being the smaller version and when a user wanted the larger version, there would be a secondary ring created but I believe Noah was leaning towards supporting multiple threads, each with their own uring potentially. The uring interface doesn't require all urings from one app be the same size so we probably shouldn't be enforcing something like that either. But @Noah-Kennedy needs to get involved and say what he wants in this feature.

FrankReh avatar Jan 13 '23 04:01 FrankReh

I have a VM with an NVMe root partition but I don't know if that can be used for these URING_CMD operations. I'll have to read some more soon.

Hmm, I have not personally played around with uring-cmd + NVMe drives, but this seems to be a good example for anyone interested. You could probably get away with some read-only operations to your root partition... but you'd be playing a dangerous game.

thomasbarrett avatar Jan 13 '23 05:01 thomasbarrett

Thanks for the link. I'll figure out why it didn't work in the morning. Also, my VM that is supposed to have an NVMe root doesn't show up as NVMe so I admit I may have spoken too soon. Maybe through virtio it doesn't give the kernel an NVMe interface anyway.

FrankReh avatar Jan 13 '23 05:01 FrankReh

Do the feature flags mean one can't have both forms of sqe and cqe in the same app? I always thought we would end up with the main driver being the smaller version and when a user wanted the larger version, there would be a secondary ring created but I believe Noah was leaning towards supporting multiple threads, each with their own uring potentially. The uring interface doesn't require all urings from one app be the same size so we probably shouldn't be enforcing something like that either. But @Noah-Kennedy needs to get involved and say what he wants in this feature.

You are correct @FrankReh. My intuition is that most application developers would probably be happy with a single SQE + CQE size. The only real downside to using an unnecessarily large SQE or CQE size is a bit of extra memory usage. It looks like supporting multiple size SQE and CQE in the same app would probably require complicating the codebase with a lot of additional generics. That being said, I would be happy to take a look at doing so if that is more aligned with the vision of the project.

thomasbarrett avatar Jan 13 '23 05:01 thomasbarrett

Thanks for the link. I'll figure out why it didn't work in the morning. Also, my VM that is supposed to have an NVMe root doesn't show up as NVMe so I admit I may have spoken too soon. Maybe through virtio it doesn't give the kernel an NVMe interface anyway.

If you are using plain QEMU, I have personally found this feature good for emulating NVMe devices.

thomasbarrett avatar Jan 13 '23 05:01 thomasbarrett

Why is there a binary file?

Noah-Kennedy avatar Jan 16 '23 22:01 Noah-Kennedy

Why is there a binary file?

Good catch. I totally missed that when looking through the diffs.

FrankReh avatar Jan 16 '23 23:01 FrankReh

Good catch @Noah-Kennedy! Removed. Perhaps a cargo cross artifact? Struggling to run CI pipeline on macOS 😄

thomasbarrett avatar Jan 17 '23 07:01 thomasbarrett

Why is the command behind a feature flag? We have other commands that don't work in older kernels.

FrankReh avatar Jan 28 '23 19:01 FrankReh

Good point, the uring_cmd16 operation doesn't need a feature flag... and for uring_cmd80, the sqe128 flag should suffice. As for the sqe128 and cqe32 flags... I will take another look at adding them as generic arguments.

thomasbarrett avatar Jan 29 '23 17:01 thomasbarrett

Hmm @FrankReh ... so I took a look at making the runtime generic w.r.t SQE and CQE size. The current holdup is that the thread_local CONTEXT variable cannot be made generic. I don't believe that there is any zero-cost way to make the runtime generic. Rather than add a bunch of boxed traits, the best move IMO is probably just to keep SQE and CQE size as a feature flag.

thomasbarrett avatar Feb 05 '23 20:02 thomasbarrett

Also, the uring_cmd tests will fail unless run on a sufficiently recent version of Linux. What is the established standard way to include version specific tests if not feature flags?

thomasbarrett avatar Feb 05 '23 20:02 thomasbarrett

Also, the uring_cmd tests will fail unless run on a sufficiently recent version of Linux. What is the established standard way to include version specific tests if not feature flags?

I guess the test needs to start by checking the feature or command is supported at runtime and quit the test gracefully if the most rudimentary aspect fails or can't be supported. Maybe leave a comment in the test. I had to put safeguards into the io-uring crate for this but don't recall having to do that for tokio-uring yet. Whatever you come up with is fine.

FrankReh avatar Feb 05 '23 22:02 FrankReh

@Noah-Kennedy Once this passes all CI tests, even if it means the tests were commented out, I'm inclined to approve something like this as it doesn't appear to change what any of us are currently doing and for the those that want the new feature, they can have it immediately - and maybe with a little experience, someone will figure out a good way of having both rings - I know other rust project(s) have already done that.

Also this is a good time to mention that something going in now might be changed for a more general solution down the road. And it's too early to be holding anyone to benchmark results from one release to the next.

FrankReh avatar Feb 05 '23 22:02 FrankReh

Ugh sorry... should have checked all of the CI after making my changes. Should be good now.

thomasbarrett avatar Feb 06 '23 03:02 thomasbarrett

Thanks for your great work:) We are planning to implement a ulbk server in rust, and this definitely help us much!

jiangliu avatar Feb 16 '23 10:02 jiangliu

I'm going to leave this for @Noah-Kennedy to comment on. It's not the direction I was hoping for but also it's not next in line for me to figure out. The project is at an interesting point. The APIs are still in flux and there's no corporate sponsorship of course. Do we put something in that we most likely will change later or do we hold off putting anything about it in until we think we know what we want in the end.

FrankReh avatar Feb 16 '23 13:02 FrankReh

Thanks for your great work:) We are planning to implement a ulbk server in rust, and this definitely help us much!

Very cool! I actually have an early stage (but working) ublk server implementation that I will probably make public in the next day or so. If this is for an open source project, I would be happy to contribute.

thomasbarrett avatar Feb 16 '23 16:02 thomasbarrett

Feel free to close out this MR. It turns out that I need a bit more manual control then tokio-uring provides at the moment. I am going this project moving forward. It essentially takes the guts of tokio-uring and creates a IoUringAsync device that can run directly on top of normal single-threaded Tokio runtime. It could be interesting to integrate something like this into tokio-uring in the future. Let me know what you think.

thomasbarrett avatar Mar 05 '23 20:03 thomasbarrett

@thomasbarrett what additional manual control do you need? We're working on providing more manual control of a number of things to users.

Noah-Kennedy avatar Mar 06 '23 17:03 Noah-Kennedy

Hey @Noah-Kennedy. It would be great if 3rd party libraries had the ability to send manual io-uring commands to the Runtime. This would allow libraries to be built w/ support for io-uring features that are not yet supported by this runtime. I have started using this project for my own work (it essentially just wraps the runtime portion of tokio-uring). I have started using this repository for my own work, but I would love to contribute some of these ideas back into tokio-uring.

https://github.com/thomasbarrett/io-uring-async

thomasbarrett avatar Apr 22 '23 20:04 thomasbarrett