liburing
liburing copied to clipboard
kernel NULL PTR - IORING_OP_URING_CMD incompatible with IORING_SETUP_SQPOLL?
I started to play a bit with ring setup flags for me fuse uring implementation and with IORING_SETUP_SQPOLL I get a kernel NULL PTR, that doesn't seem to be related to my code.
With sqe->opcode = IORING_OP_URING_CMD;
[ 34.089396] fuse: loading out-of-tree module taints kernel.
[ 34.110398] fuse: init (API version 7.38)
[ 45.734693] fuse: fc=ffff888121f3e800 nr-queues=32 depth=16 ioctl ready
[ 45.775779] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 45.777471] #PF: supervisor instruction fetch in kernel mode
[ 45.778787] #PF: error_code(0x0010) - not-present page
[ 45.779953] PGD 0 P4D 0
[ 45.780612] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 45.781844] CPU: 30 PID: 867 Comm: fuse-ring-1 Tainted: G O 6.2.0-rc8 #6
[ 45.783646] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 45.785685] RIP: 0010:0x0
[ 45.786369] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 45.787818] RSP: 0018:ffffc900057ffc58 EFLAGS: 00010246
[ 45.789027] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff81598194
[ 45.790618] RDX: 0000000000000003 RSI: ffffc900057ffd00 RDI: ffff88817a29cdc0
[ 45.793370] RBP: ffffc900057ffd70 R08: ffffffff8157f2d6 R09: ffff8881261c243b
[ 45.794996] R10: ffffc900057ffd88 R11: 0000000000000001 R12: ffff88817a29ce01
[ 45.796658] R13: ffff88817a29cdc0 R14: ffffffffa08f43e0 R15: ffff88817a29ce38
[ 45.798294] FS: 00007fa443dfd700(0000) GS:ffff88881b000000(0000) knlGS:0000000000000000
[ 45.800243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.801597] CR2: ffffffffffffffd6 CR3: 0000000130ace005 CR4: 00000000001706e0
[ 45.803285] Call Trace:
[ 45.803969] <TASK>
[ 45.804600] io_do_iopoll+0x1fe/0x940
[ 45.805562] ? blk_finish_plug+0x44/0x60
[ 45.806580] ? io_submit_sqes+0x536/0xc30
[ 45.807598] ? io_rw_fail+0x70/0x70
[ 45.808511] ? __x64_sys_io_uring_enter+0x966/0x12e0
[ 45.809705] __x64_sys_io_uring_enter+0x966/0x12e0
[ 45.810918] ? kernel_write+0x3d0/0x3d0
[ 45.811911] ? io_run_task_work_sig+0xf0/0xf0
[ 45.812999] ? do_futex+0xf7/0x190
[ 45.813902] ? __x64_sys_get_robust_list+0x260/0x260
[ 45.815135] ? rseq_syscall+0x69/0xe0
[ 45.816088] ? __rseq_handle_notify_resume+0x4e0/0x4e0
[ 45.817312] ? mark_held_locks+0x23/0x90
[ 45.818321] ? lockdep_hardirqs_on_prepare+0x13d/0x200
[ 45.819576] ? syscall_enter_from_user_mode+0x1d/0x50
[ 45.820826] ? trace_hardirqs_on+0x2d/0x110
[ 45.823121] do_syscall_64+0x3d/0x90
[ 45.824071] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Can you include some details on how you are reproducing this? Would save us a lot of time when looking into this.
Actually I think I see what it is...
This should fix it:
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 446a189b78b0..e3413f131887 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -101,6 +101,18 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return 0;
}
+static bool io_uring_cmd_supported(struct io_ring_ctx *ctx, struct file *file)
+{
+ /* no issue method, fail */
+ if (!file->f_op->uring_cmd)
+ return false;
+ /* IOPOLL enabled and no poll method, fail */
+ if (ctx->flags & IORING_SETUP_IOPOLL && !file->f_op->uring_cmd_iopoll)
+ return false;
+
+ return true;
+}
+
int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
@@ -108,7 +120,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
struct file *file = req->file;
int ret;
- if (!req->file->f_op->uring_cmd)
+ if (!io_uring_cmd_supported(ctx, file))
return -EOPNOTSUPP;
ret = security_uring_cmd(ioucmd);
Just recompiling liburing, I guess it should be easy to reproduce with test/io_uring/passthrough.c. My dev VM has a debug kernel - totally sluggish. And I had crashed a couple of times my main system by liburing recompilation/test run before - not recompiling it anymore on non-VMs...
Thanks, going to test your patch then.
The liburing tests are very aggressive, and since we always add tests when an issue was found, they may very well crash or otherwise behave weirdly on kernels that aren't uptodate. Any current or stable kernel should be fine though, except you found a new issue! I sent out a patch for it on the list too, it'll go into -stable as well.
Yeah, I noticed that.
Ok, the test doesn't work, I don't have nvmes in the VM. Maybe we should try to add a test based on ublk (or fuse in the future) with that command).
Going to recompile my kernel will try out your patch directly.
Yes, the test requires an nvme device. I think Ming was pondering some basic ublk tests. Agree that it would be nice to expand the coverage on that front.
Thanks, this works!
From usability it is a bit confusing as it is the cqe that returns EOPNOTSUPP. For a user ideally io_uring_queue_init_params() would fail, but I see, it does not get the fd. The next best function for me would be io_uring_register_files(), maybe we could add sanity checks in io_sqe_files_register()? And maybe with that we could avoid the io_uring_cmd_supported() call for each and sqe/command submission?
It really has to be part of the issue, unfortunately. You cannot do this at ring creation time, as you noted. And since this has to work with both regular and fixed files, there's really no way around it. But not a huge deal, as we need to check file->f_op->uring_cmd anyway.
Fixed by kernel commit 03b3d6be73e81ddb7c2930d942cdd17f4cfd5ba5