tokio-uring
                                
                                 tokio-uring copied to clipboard
                                
                                    tokio-uring copied to clipboard
                            
                            
                            
                        Add support for IORING_OP_URING_CMD
This PR adds optional support for the following IO_URING features:
- 
SQE128 128-byte submission queue entries were introduced in Linux 5.19. This PR enables 128-byte submission queue entries when the squeue-entry128feature is enabled.
- 
CQE32 32-byte completion queue entries were introduced in Linux 5.19. This PR enables 32-byte completion queue entries when the cqueue-entry32feature is enabled.
- 
IORING_OP_URING_CMD The uring-cmdcommand introduced in Linux 5.19 allows device specific commands to be passed through io_uring. There are a variety of devices that support and / or requireuring-cmdincluding, but not limited to,/dev/null, nvme devices, ublk devices. This PR enables the uring-cmd operation to be used when theuring-cmdfeature is enabled.
Testing
The Ubuntu Github Action runner is not running a new enough version of Linux to test the squeue-entry128, cqueue-entry32, or uring-cmd features. These can be tested on a sufficiently recent version of Linux (I used 6.1). Each of the 3 features can be enabled independently.
cargo test --features sqe128
cargo test --features cqe32
cargo test --features uring-cmd
cargo test --features sqe128,uring-cmd
I have been testing this on a Linux 6.1 QEMU VM on the “/dev/ublk-control” device created by the ublk module.
in general, IORING_OP_URING_CMD is used for device specific operations, very similar to ioctl. In addition to ublk, I believe that NVMe devices and /dev/null also support IORING_OP_URING_CMD. I will add some tests using /dev/null since it doesn’t require any specific devices.
I have a VM with an NVMe root partition but I don't know if that can be used for these URING_CMD operations. I'll have to read some more soon.
Do the feature flags mean one can't have both forms of sqe and cqe in the same app? I always thought we would end up with the main driver being the smaller version and when a user wanted the larger version, there would be a secondary ring created but I believe Noah was leaning towards supporting multiple threads, each with their own uring potentially. The uring interface doesn't require all urings from one app be the same size so we probably shouldn't be enforcing something like that either. But @Noah-Kennedy needs to get involved and say what he wants in this feature.
I have a VM with an NVMe root partition but I don't know if that can be used for these URING_CMD operations. I'll have to read some more soon.
Hmm, I have not personally played around with uring-cmd + NVMe drives, but this seems to be a good example for anyone interested. You could probably get away with some read-only operations to your root partition... but you'd be playing a dangerous game.
Thanks for the link. I'll figure out why it didn't work in the morning. Also, my VM that is supposed to have an NVMe root doesn't show up as NVMe so I admit I may have spoken too soon. Maybe through virtio it doesn't give the kernel an NVMe interface anyway.
Do the feature flags mean one can't have both forms of sqe and cqe in the same app? I always thought we would end up with the main driver being the smaller version and when a user wanted the larger version, there would be a secondary ring created but I believe Noah was leaning towards supporting multiple threads, each with their own uring potentially. The uring interface doesn't require all urings from one app be the same size so we probably shouldn't be enforcing something like that either. But @Noah-Kennedy needs to get involved and say what he wants in this feature.
You are correct @FrankReh. My intuition is that most application developers would probably be happy with a single SQE + CQE size. The only real downside to using an unnecessarily large SQE or CQE size is a bit of extra memory usage. It looks like supporting multiple size SQE and CQE in the same app would probably require complicating the codebase with a lot of additional generics. That being said, I would be happy to take a look at doing so if that is more aligned with the vision of the project.
Thanks for the link. I'll figure out why it didn't work in the morning. Also, my VM that is supposed to have an NVMe root doesn't show up as NVMe so I admit I may have spoken too soon. Maybe through virtio it doesn't give the kernel an NVMe interface anyway.
If you are using plain QEMU, I have personally found this feature good for emulating NVMe devices.
Why is there a binary file?
Why is there a binary file?
Good catch. I totally missed that when looking through the diffs.
Good catch @Noah-Kennedy! Removed. Perhaps a cargo cross artifact? Struggling to run CI pipeline on macOS 😄
Why is the command behind a feature flag? We have other commands that don't work in older kernels.
Good point, the uring_cmd16 operation doesn't need a feature flag... and for uring_cmd80, the sqe128 flag should suffice. As for the sqe128 and cqe32 flags... I will take another look at adding them as generic arguments.
Hmm @FrankReh ... so I took a look at making the runtime generic w.r.t SQE and CQE size. The current holdup is that the thread_local CONTEXT variable cannot be made generic. I don't believe that there is any zero-cost way to make the runtime generic. Rather than add a bunch of boxed traits, the best move IMO is probably just to keep SQE and CQE size as a feature flag.
Also, the uring_cmd tests will fail unless run on a sufficiently recent version of Linux. What is the established standard way to include version specific tests if not feature flags?
Also, the uring_cmd tests will fail unless run on a sufficiently recent version of Linux. What is the established standard way to include version specific tests if not feature flags?
I guess the test needs to start by checking the feature or command is supported at runtime and quit the test gracefully if the most rudimentary aspect fails or can't be supported. Maybe leave a comment in the test. I had to put safeguards into the io-uring crate for this but don't recall having to do that for tokio-uring yet. Whatever you come up with is fine.
@Noah-Kennedy Once this passes all CI tests, even if it means the tests were commented out, I'm inclined to approve something like this as it doesn't appear to change what any of us are currently doing and for the those that want the new feature, they can have it immediately - and maybe with a little experience, someone will figure out a good way of having both rings - I know other rust project(s) have already done that.
Also this is a good time to mention that something going in now might be changed for a more general solution down the road. And it's too early to be holding anyone to benchmark results from one release to the next.
Ugh sorry... should have checked all of the CI after making my changes. Should be good now.
Thanks for your great work:) We are planning to implement a ulbk server in rust, and this definitely help us much!
I'm going to leave this for @Noah-Kennedy to comment on. It's not the direction I was hoping for but also it's not next in line for me to figure out. The project is at an interesting point. The APIs are still in flux and there's no corporate sponsorship of course. Do we put something in that we most likely will change later or do we hold off putting anything about it in until we think we know what we want in the end.
Thanks for your great work:) We are planning to implement a ulbk server in rust, and this definitely help us much!
Very cool! I actually have an early stage (but working) ublk server implementation that I will probably make public in the next day or so. If this is for an open source project, I would be happy to contribute.
Feel free to close out this MR. It turns out that I need a bit more manual control then tokio-uring provides at the moment. I am going this project moving forward. It essentially takes the guts of tokio-uring and creates a IoUringAsync device that can run directly on top of normal single-threaded Tokio runtime. It could be interesting to integrate something like this into tokio-uring in the future. Let me know what you think.
@thomasbarrett what additional manual control do you need? We're working on providing more manual control of a number of things to users.
Hey @Noah-Kennedy. It would be great if 3rd party libraries had the ability to send manual io-uring commands to the Runtime. This would allow libraries to be built w/ support for io-uring features that are not yet supported by this runtime. I have started using this project for my own work (it essentially just wraps the runtime portion of tokio-uring). I have started using this repository for my own work, but I would love to contribute some of these ideas back into tokio-uring.
https://github.com/thomasbarrett/io-uring-async