liburing icon indicating copy to clipboard operation
liburing copied to clipboard

io_uring nvme examples?

Open espoal opened this issue 3 years ago • 2 comments
trafficstars

I wanted to test NVME passtru with liburing, but I cannot find any example for . In particular I'm interested how to:

  • Correctly format NVME commands for io_uring
  • Correctly setup the big queues and any other bureaucracy needed

Amazing project here, your work is inspiring me and many of my coworkers to become better engineers.

espoal avatar Jun 25 '22 09:06 espoal

So here's some updates on what I found:

  • xnvme provide a good example of setting up and interacting with the ring
  • Here I'm trying some implementation in GO

If someone wants just to use basic nvme commands with io_uring then xnvme.io is a good candidate. But if someone wants to use both nvme and network (e.g: RDMA over nvme devices) then xnvme is not enough. The approach I'm trying is to use libxnvme to create the commands, and then pass them to io_uring.

What is still not clear to me:

  • How to send complex commands (e.g. copy, fused commands, stream write specification commands, ....)
  • What is the best configuration for io_uring on the latest kernel given this use case

And probably more stuff I can't see at the moment, but that's why I'm trying to build an implementation in Go.

I'm hopeful someone will find this useful in the future, but if you're struggling with this now feel free to ping me.

espoal avatar Jul 11 '22 10:07 espoal

Hi @espoal recently we added simple test for NVMe passthru commands to liburing, you can check this link.

With xnvme you can send NVMe passthru commands, you can check this xnvme

This will require 5.19 kernel and liburing-2.2

ankit-sam avatar Aug 09 '22 07:08 ankit-sam

After many months, I finally managed to write an example, but it's still not working :(

I get error 95 (operation not supported). If I try with nvme-cli it works ( sudo nvme read /dev/nvme0n1 -s 1000000 -c 1 -z 4096 )

Can you @ankit-sam or anyone else spot the error in my code?

espoal avatar Nov 21 '22 11:11 espoal

Hi @espoal I am not quite familiar with RUST, but looking at the code I see you have

let path = "/dev/nvme0n1"; This should be character device /dev/ng0n1

I hope let builder = IoUring::<squeue::Entry128, cqueue::Entry32>::generic_builder(); this is passing IORING_SETUP_SQE128 and IORING_SETUP_CQE32 flags for ring creation.

Other than that it looks ok, maybe compare the specific entries from here: https://github.com/axboe/liburing/blob/master/test/io_uring_passthrough.c#L170

ankit-sam avatar Nov 21 '22 11:11 ankit-sam

thanks @ankit-sam , really appreciate your help.

If I try /dev/ng0n1 I get error 25 (not a typewriter). Maybe I should pass some special flags when opening the device?

The flags should be passed correctly, and I see from debug view that sizes of sqe and cqe are set correctly.

Damn, so close yet so far :D

espoal avatar Nov 21 '22 12:11 espoal

we were just passing O_RDONLY or O_WRONLY for read or write while opening the device: https://github.com/axboe/liburing/blob/master/test/io_uring_passthrough.c#L79

Also please check if sqe->opcode = IORING_OP_URING_CMD; this is correctly set.

ankit-sam avatar Nov 21 '22 12:11 ankit-sam

@ankit-sam I got it working!!!!! the problem was sqe->opcode = IORING_OP_URING_CMD; as you mentioned.

I'm so happy, it was so hard but I feel I learnt a lot. Thank you SO MUCH. I'm sure I will have more questions though :D

espoal avatar Nov 21 '22 13:11 espoal

I also want to use io_uring_passthrough to send trim command. Could you help to prepare a example for nvme trim command? Thanks. @ankit-sam

suxinggm avatar Mar 05 '24 15:03 suxinggm

Hi @suxinggm , you can check the fio code on how to send trim command with io_uring_pasthrough It works as per the nvme specification. https://github.com/axboe/fio/blob/master/engines/io_uring.c has the io_uring_cmd ioengine, look for calls to https://github.com/axboe/fio/blob/master/engines/nvme.c file. fio_nvme_uring_cmd_trim_prep is function that handles trim command creation. You will have to manage the data buffers for range entries. For that you can also check the recent commits to the above two mentioned files.

ankit-sam avatar Mar 06 '24 03:03 ankit-sam

Appreciate for your quick response. I will check it and test. Thanks a lot~~ @ankit-sam

suxinggm avatar Mar 07 '24 03:03 suxinggm

Hi @ankit-sam, may I ask for your advice again. Can I get the nvme cqe data? I can only get all zero in big_cqe in struct io_uring_cqe.

suxinggm avatar Mar 08 '24 14:03 suxinggm