ocaml-uring icon indicating copy to clipboard operation
ocaml-uring copied to clipboard

Hangs writing to ZFS with fixed buffers

Open talex5 opened this issue 2 months ago • 0 comments

When run on a ZFS partition (Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-102-generic x86_64)), this program spins forever and cannot be killed:

let rec wait_with_retry uring =
  match Uring.wait uring with
  | None -> wait_with_retry uring        (* Interrupted *)
  | Some { result; data } -> result, data

let () =
  let uring = Uring.create ~queue_depth:2 () in
  let buf = Cstruct.of_string "ab" in
  Uring.set_fixed_buffer uring buf.buffer |> Result.get_ok;
  let fd = Unix.openfile "test.data" [O_CREAT; O_TRUNC; O_RDWR] 0o600 in
  for i = 0 to 1 do
    let job = Uring.write_fixed uring fd ~file_offset:(Optint.Int63.of_int i) ~off:i ~len:1 () in
    assert (Option.is_some job);
    let x = Uring.submit uring in
    assert (x = 1);
    let result, () = wait_with_retry uring in
    assert (result = 1);
  done

Based on original report by @patricoferris at https://github.com/ocaml-multicore/eio/pull/715#issuecomment-2043925492.

pidstat -t 1 shows:

10:40:58      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:40:59     1000      1027         -    0.00   99.02    0.00    0.00   99.02     0  main.exe
10:40:59     1000         -      1048    0.00   98.04    0.00    0.98   98.04     0  |__iou-wrk-1027

perf record -g shows:

   - zpl_iter_write                                                                                                  ▒
      - 98.63% zfs_write                                                                                             ▒
         + 27.21% dmu_tx_assign                                                                                      ▒
         + 26.45% dmu_tx_commit                                                                                      ▒
         + 19.34% dmu_write_uio_dbuf                                                                                 ▒
         + 11.55% dmu_tx_hold_write_by_dnode                                                                         ▒
         + 5.20% dmu_tx_create                                                                                       ▒
         + 3.83% dmu_tx_hold_sa                                                                                      ▒
           0.82% zfs_clear_setid_bits_if_necessary   

The process cannot be killed, even with kill -9.

talex5 avatar Apr 25 '24 10:04 talex5