ocaml-uring
ocaml-uring copied to clipboard
Hangs writing to ZFS with fixed buffers
When run on a ZFS partition (Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-102-generic x86_64)), this program spins forever and cannot be killed:
let rec wait_with_retry uring =
match Uring.wait uring with
| None -> wait_with_retry uring (* Interrupted *)
| Some { result; data } -> result, data
let () =
let uring = Uring.create ~queue_depth:2 () in
let buf = Cstruct.of_string "ab" in
Uring.set_fixed_buffer uring buf.buffer |> Result.get_ok;
let fd = Unix.openfile "test.data" [O_CREAT; O_TRUNC; O_RDWR] 0o600 in
for i = 0 to 1 do
let job = Uring.write_fixed uring fd ~file_offset:(Optint.Int63.of_int i) ~off:i ~len:1 () in
assert (Option.is_some job);
let x = Uring.submit uring in
assert (x = 1);
let result, () = wait_with_retry uring in
assert (result = 1);
done
Based on original report by @patricoferris at https://github.com/ocaml-multicore/eio/pull/715#issuecomment-2043925492.
pidstat -t 1 shows:
10:40:58 UID TGID TID %usr %system %guest %wait %CPU CPU Command
10:40:59 1000 1027 - 0.00 99.02 0.00 0.00 99.02 0 main.exe
10:40:59 1000 - 1048 0.00 98.04 0.00 0.98 98.04 0 |__iou-wrk-1027
perf record -g shows:
- zpl_iter_write ▒
- 98.63% zfs_write ▒
+ 27.21% dmu_tx_assign ▒
+ 26.45% dmu_tx_commit ▒
+ 19.34% dmu_write_uio_dbuf ▒
+ 11.55% dmu_tx_hold_write_by_dnode ▒
+ 5.20% dmu_tx_create ▒
+ 3.83% dmu_tx_hold_sa ▒
0.82% zfs_clear_setid_bits_if_necessary
The process cannot be killed, even with kill -9.
https://github.com/openzfs/zfs/issues/16133#issuecomment-3074568964 says this is fixed in ZFS 2.3.3 (and 2.2.8).
Confirming this is fixed for me in those versions. But the older versions are still really prevalent. Not sure if we should do something smart and detect the ZFS kernel version in Eio_main...
A basic start might be SEO… note it in the source code, docs, & link back here to make a trail for folks to get funneled into the issue.