fio icon indicating copy to clipboard operation
fio copied to clipboard

trim CMD. does not work when I/O replay

Open tmdrod opened this issue 6 years ago • 5 comments

Hi, Trim CMD. does not work when I/O replay. Now, I am checking to see which CMDs are supported for I/O replay in FIO for my I/O simulation project. The read/write/fsync operations works well, but trim generates the following error in I/O replay

Environment : Ubuntu 18.04, FIO version 3.1 CMD : fio --numjobs=1 --buffered=0 --read_iolog=./trim1.log --name=ioreplay --iodepth=1

bad ddir: 2 fio: pid=7051, err=22/file:iolog.c:589, func=failed initializing iolog, error=Invalid argument

tmdrod avatar Apr 18 '19 08:04 tmdrod

@tmdrod can you reproduce in the latest fio (we only support the latest)? See https://github.com/axboe/fio/releases for which version we're up to.

sitsofe avatar Oct 15 '19 05:10 sitsofe

Closing due to lack of reply from reporter. If this issue is still happening with the latest fio (see https://github.com/axboe/fio/releases to find out which version that is) please reopen. Thanks!

sitsofe avatar Oct 28 '19 06:10 sitsofe

Hi, I got the same issue with latest fio version Environment : Centos 7, FIO version 3.29

the minimal trace file: test.iolog fio version 2 iolog /dev/nvme0n1 add /dev/nvme0n1 open /dev/nvme0n1 write 75689525248 16384 /dev/nvme0n1 sync 0 0 /dev/nvme0n1 trim 75689525248 16384 /dev/nvme0n1 close

CMD: sudo fio --name=global --ioengine=libaio --name=replay --read_iolog=test.iolog

then I got output like : `replay: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.29 Starting 1 process bad ddir: 2

replay: (groupid=0, jobs=1): err= 0: pid=8939: Fri Dec 31 17:46:06 2021 write: IOPS=500, BW=8000KiB/s (8192kB/s)(16.0KiB/2msec); 0 zone resets slat (nsec): min=15580, max=15580, avg=15580.00, stdev= 0.00 clat (nsec): min=1830, max=1830, avg=1830.00, stdev= 0.00 lat (nsec): min=17730, max=17730, avg=17730.00, stdev= 0.00 clat percentiles (nsec): | 1.00th=[ 1832], 5.00th=[ 1832], 10.00th=[ 1832], 20.00th=[ 1832], | 30.00th=[ 1832], 40.00th=[ 1832], 50.00th=[ 1832], 60.00th=[ 1832], | 70.00th=[ 1832], 80.00th=[ 1832], 90.00th=[ 1832], 95.00th=[ 1832], | 99.00th=[ 1832], 99.50th=[ 1832], 99.90th=[ 1832], 99.95th=[ 1832], | 99.99th=[ 1832] lat (usec) : 2=100.00% fsync/fdatasync/sync_file_range: sync (nsec): min=190, max=190, avg=190.00, stdev= 0.00 sync percentiles (nsec): | 1.00th=[ 191], 5.00th=[ 191], 10.00th=[ 191], 20.00th=[ 191], | 30.00th=[ 191], 40.00th=[ 191], 50.00th=[ 191], 60.00th=[ 191], | 70.00th=[ 191], 80.00th=[ 191], 90.00th=[ 191], 95.00th=[ 191], | 99.00th=[ 191], 99.50th=[ 191], 99.90th=[ 191], 99.95th=[ 191], | 99.99th=[ 191] cpu : usr=0.00%, sys=0.00%, ctx=4, majf=0, minf=9 IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,1,0,1 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs): WRITE: bw=8000KiB/s (8192kB/s), 8000KiB/s-8000KiB/s (8192kB/s-8192kB/s), io=16.0KiB (16.4kB), run=2-2msec`

the "bad ddir: 2" only appears when the tracefile has "trim" operation

Seanistolen avatar Dec 31 '21 09:12 Seanistolen

sorry the tracefile in a wrong format

fio version 2 iolog /dev/nvme0n1 add /dev/nvme0n1 open /dev/nvme0n1 write 75689525248 16384 /dev/nvme0n1 sync 0 0 /dev/nvme0n1 trim 75689525248 16384 /dev/nvme0n1 close

Seanistolen avatar Dec 31 '21 10:12 Seanistolen

It looks like read_iolog2() in iolog.c is missing the case for trim. It's a bit tricky because a replay need not match that workloads in https://github.com/axboe/fio/blob/a3e33e2fc06582e4170f90ae6e62d6225d52dc7c/io_ddir.h#L42-L43 . Probably the easiest thing is to treat trims similar to writes (skip them if readonly is set).

sitsofe avatar Jan 02 '22 20:01 sitsofe

I'm hitting this issue as well using fio 3.33 and iolog v3.

fio version 3 iolog
50 /dev/nvme0n1 add
100 /dev/nvme0n1 open
10000 /dev/nvme0n1 write 0 4096
20000 /dev/nvme0n1 sync 0 0
30000 /dev/nvme0n1 trim 0 4096
40000 /dev/nvme0n1 read 0 4096
50000 /dev/nvme0n1 close
$ sudo fio --name=replay --read_iolog=replay.iolog --direct=1
replay: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
bad ddir: 2

replay: (groupid=0, jobs=1): err= 0: pid=13168: Wed Feb  8 15:04:16 2023
  read: IOPS=24, BW=97.6KiB/s (99.9kB/s)(4096B/41msec)
    clat (nsec): min=126842, max=126842, avg=126842.00, stdev= 0.00
     lat (nsec): min=127533, max=127533, avg=127533.00, stdev= 0.00
    clat percentiles (usec):
     |  1.00th=[  127],  5.00th=[  127], 10.00th=[  127], 20.00th=[  127],
     | 30.00th=[  127], 40.00th=[  127], 50.00th=[  127], 60.00th=[  127],
     | 70.00th=[  127], 80.00th=[  127], 90.00th=[  127], 95.00th=[  127],
     | 99.00th=[  127], 99.50th=[  127], 99.90th=[  127], 99.95th=[  127],
     | 99.99th=[  127]
  write: IOPS=24, BW=97.6KiB/s (99.9kB/s)(4096B/41msec); 0 zone resets
    clat (nsec): min=5400.0k, max=5400.0k, avg=5400034.00, stdev= 0.00
     lat (nsec): min=5400.5k, max=5400.5k, avg=5400523.00, stdev= 0.00
    clat percentiles (usec):
     |  1.00th=[ 5407],  5.00th=[ 5407], 10.00th=[ 5407], 20.00th=[ 5407],
     | 30.00th=[ 5407], 40.00th=[ 5407], 50.00th=[ 5407], 60.00th=[ 5407],
     | 70.00th=[ 5407], 80.00th=[ 5407], 90.00th=[ 5407], 95.00th=[ 5407],
     | 99.00th=[ 5407], 99.50th=[ 5407], 99.90th=[ 5407], 99.95th=[ 5407],
     | 99.99th=[ 5407]
  lat (usec)   : 250=50.00%
  lat (msec)   : 10=50.00%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=1694.7k, max=1694.7k, avg=1694729.00, stdev= 0.00
    sync percentiles (usec):
     |  1.00th=[ 1696],  5.00th=[ 1696], 10.00th=[ 1696], 20.00th=[ 1696],
     | 30.00th=[ 1696], 40.00th=[ 1696], 50.00th=[ 1696], 60.00th=[ 1696],
     | 70.00th=[ 1696], 80.00th=[ 1696], 90.00th=[ 1696], 95.00th=[ 1696],
     | 99.00th=[ 1696], 99.50th=[ 1696], 99.90th=[ 1696], 99.95th=[ 1696],
     | 99.99th=[ 1696]
  cpu          : usr=0.00%, sys=0.00%, ctx=8, majf=0, minf=9
  IO depths    : 1=150.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1,1,0,1 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=97.6KiB/s (99.9kB/s), 97.6KiB/s-97.6KiB/s (99.9kB/s-99.9kB/s), io=4096B (4096B), run=41-41msec
  WRITE: bw=97.6KiB/s (99.9kB/s), 97.6KiB/s-97.6KiB/s (99.9kB/s-99.9kB/s), io=4096B (4096B), run=41-41msec
$ sudo btrace /dev/nvme0n1
259,0    4        1     0.000000000 13168  Q  WS 0 + 8 [fio]
259,0    4        2     0.000002433 13168  G  WS 0 + 8 [fio]
259,0    4        3     0.000002972 13168  P   N [fio]
259,0    4        4     0.000003443 13168 UT   N [fio] 1
259,0    4        5     0.000003751 13168  I  WS 0 + 8 [fio]
259,0    4        6     0.000009655   174  D  WS 0 + 8 [kworker/4:1H]
259,0    4        7     0.005376159     0  C  WS 0 + 8 [0]
259,0    4        8     0.010060805 13168  Q FWS [fio]
259,0    4        9     0.010063373 13168  G FWS [fio]
259,0    4       10     0.010069884   174  D  FN [kworker/4:1H]
259,0    4       11     0.011732456     0  C  FN 0 [0]
259,0    4       12     0.011734586     0  C  WS 0 [0]
259,0    4       13     0.020220173 13168  Q   R 0 + 8 [fio]
259,0    4       14     0.020222355 13168  G   R 0 + 8 [fio]
259,0    4       15     0.020225253 13168  D   R 0 + 8 [fio]
259,0    4       16     0.020332064     0  C   R 0 + 8 [0]

mjekdi avatar Feb 08 '23 23:02 mjekdi

Please try this branch: https://github.com/vincentkfu/fio/tree/fio-769

vincentkfu avatar Feb 10 '23 19:02 vincentkfu

Hi Vincent,

That appears to have fixed it although I got some unexpected results in btrace (see below).

$ sudo fio/fio --name=replay --read_iolog=replay.iolog --direct=1
replay: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.33-85-ge72e
Starting 1 process

replay: (groupid=0, jobs=1): err= 0: pid=8226: Fri Feb 10 12:44:42 2023
  read: IOPS=19, BW=78.4KiB/s (80.3kB/s)(4096B/51msec)
    clat (nsec): min=97322, max=97322, avg=97322.00, stdev= 0.00
     lat (nsec): min=97560, max=97560, avg=97560.00, stdev= 0.00
    clat percentiles (nsec):
     |  1.00th=[97792],  5.00th=[97792], 10.00th=[97792], 20.00th=[97792],
     | 30.00th=[97792], 40.00th=[97792], 50.00th=[97792], 60.00th=[97792],
     | 70.00th=[97792], 80.00th=[97792], 90.00th=[97792], 95.00th=[97792],
     | 99.00th=[97792], 99.50th=[97792], 99.90th=[97792], 99.95th=[97792],
     | 99.99th=[97792]
  write: IOPS=19, BW=78.4KiB/s (80.3kB/s)(4096B/51msec); 0 zone resets
    clat (nsec): min=12046, max=12046, avg=12046.00, stdev= 0.00
     lat (nsec): min=12530, max=12530, avg=12530.00, stdev= 0.00
    clat percentiles (nsec):
     |  1.00th=[12096],  5.00th=[12096], 10.00th=[12096], 20.00th=[12096],
     | 30.00th=[12096], 40.00th=[12096], 50.00th=[12096], 60.00th=[12096],
     | 70.00th=[12096], 80.00th=[12096], 90.00th=[12096], 95.00th=[12096],
     | 99.00th=[12096], 99.50th=[12096], 99.90th=[12096], 99.95th=[12096],
     | 99.99th=[12096]
  trim: IOPS=19, BW=78.4KiB/s (80.3kB/s)(4096B/51msec)
    clat (nsec): min=658654, max=658654, avg=658654.00, stdev= 0.00
     lat (nsec): min=658874, max=658874, avg=658874.00, stdev= 0.00
    clat percentiles (usec):
     |  1.00th=[  660],  5.00th=[  660], 10.00th=[  660], 20.00th=[  660],
     | 30.00th=[  660], 40.00th=[  660], 50.00th=[  660], 60.00th=[  660],
     | 70.00th=[  660], 80.00th=[  660], 90.00th=[  660], 95.00th=[  660],
     | 99.00th=[  660], 99.50th=[  660], 99.90th=[  660], 99.95th=[  660],
     | 99.99th=[  660]
  lat (usec)   : 20=33.33%, 100=33.33%, 750=33.33%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=8494.7k, max=8494.7k, avg=8494731.00, stdev= 0.00
    sync percentiles (usec):
     |  1.00th=[ 8455],  5.00th=[ 8455], 10.00th=[ 8455], 20.00th=[ 8455],
     | 30.00th=[ 8455], 40.00th=[ 8455], 50.00th=[ 8455], 60.00th=[ 8455],
     | 70.00th=[ 8455], 80.00th=[ 8455], 90.00th=[ 8455], 95.00th=[ 8455],
     | 99.00th=[ 8455], 99.50th=[ 8455], 99.90th=[ 8455], 99.95th=[ 8455],
     | 99.99th=[ 8455]
  cpu          : usr=0.00%, sys=0.00%, ctx=10, majf=0, minf=10
  IO depths    : 1=133.3%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1,1,1,1 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=78.4KiB/s (80.3kB/s), 78.4KiB/s-78.4KiB/s (80.3kB/s-80.3kB/s), io=4096B (4096B), run=51-51msec
  WRITE: bw=78.4KiB/s (80.3kB/s), 78.4KiB/s-78.4KiB/s (80.3kB/s-80.3kB/s), io=4096B (4096B), run=51-51msec
   TRIM: bw=78.4KiB/s (80.3kB/s), 78.4KiB/s-78.4KiB/s (80.3kB/s-80.3kB/s), io=4096B (4096B), run=51-51msec

I was expecting a read from LBA 0 + 8 at the end but got this instead:

$ sudo btrace /dev/nvme0n1
259,0    5        1     0.000000000  8226  Q  WS 0 + 8 [fio]
259,0    5        2     0.000002778  8226  G  WS 0 + 8 [fio]
259,0    5        3     0.000003269  8226  P   N [fio]
259,0    5        4     0.000004458  8226  U   N [fio] 1
259,0    5        5     0.000005946  8226  D  WS 0 + 8 [fio]
259,0    5        6     0.005464954     0  C  WS 0 + 8 [0]
259,0    5        7     0.005474291  8226  Q FWS [fio]
259,0    5        8     0.005475052  8226  G FWS [fio]
259,0    5        9     0.005480482   222  D  FN [kworker/5:1H]
259,0    5       10     0.008481747     0  C  FN 0 [0]
259,0    5       11     0.008482292     0  C  WS 0 [0]
259,0    5       12     0.010061588  8226  Q  DS 0 + 8 [fio]
259,0    5       13     0.010062483  8226  G  DS 0 + 8 [fio]
259,0    5       14     0.010062655  8226  P   N [fio]
259,0    5       15     0.010062933  8226 UT   N [fio] 1
259,0    5       16     0.010063124  8226  I  DS 0 + 8 [fio]
259,0    5       17     0.010067574   222  D  DS 0 + 8 [kworker/5:1H]
259,0    5       18     0.010710169     0  C  DS 0 + 8 [0]
259,0    5       19     0.020174367  8226  Q  RA 0 + 32 [fio]
259,0    5       20     0.020175270  8226  G  RA 0 + 32 [fio]
259,0    5       21     0.020175441  8226  P   N [fio]
259,0    5       22     0.020175623  8226  U   N [fio] 1
259,0    5       23     0.020177881  8226  D  RA 0 + 32 [fio]
259,0    5       24     0.020248865     0  C  RA 0 + 32 [0]
259,0    5       25     0.020260330  8226  Q  RA 32 + 64 [fio]
259,0    5       26     0.020261090  8226  G  RA 32 + 64 [fio]
259,0    5       27     0.020261276  8226  P   N [fio]
259,0    5       28     0.020261456  8226  U   N [fio] 1
259,0    5       29     0.020262513  8226  D  RA 32 + 64 [fio]
259,0    5       30     0.020323015     0  C  RA 32 + 64 [0]

mjekdi avatar Feb 10 '23 20:02 mjekdi

I'm running against null_blk and see similar btrace output to yours. I don't know why it deviates from what we expect. However, you can run fio with --debug=io to see what fio is doing and as far as I can tell it is issuing the appropriate commands based on the iolog.

vincentkfu avatar Feb 10 '23 22:02 vincentkfu

Thanks for your help Vincent. I tried with --debug=io and fio does appear to be issuing the correct commands.

$ sudo fio/fio --name=replay --read_iolog=replay.iolog --direct=1 --debug=io
fio: set debug option io
replay: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.33-85-ge72e
Starting 1 process
io       6978  iolog: name=replay.iolog
io       6978  drop page cache /dev/nvme0n1
io       6978  iolog: get 0/4096//dev/nvme0n1
io       6978  prep: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=1,file=/dev/nvme0n1
io       6978  queue: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=1,file=/dev/nvme0n1
io       6978  complete: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=1,file=/dev/nvme0n1
io       6978  iolog: get 0/0//dev/nvme0n1
io       6978  prep: io_u 0x55ef884e5800: off=0x0,len=0x0,ddir=3,file=/dev/nvme0n1
io       6978  queue: io_u 0x55ef884e5800: off=0x0,len=0x0,ddir=3,file=/dev/nvme0n1
io       6978  complete: io_u 0x55ef884e5800: off=0x0,len=0x0,ddir=3,file=/dev/nvme0n1
io       6978  iolog: get 0/4096//dev/nvme0n1
io       6978  prep: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=2,file=/dev/nvme0n1
io       6978  queue: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=2,file=/dev/nvme0n1
io       6978  complete: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=2,file=/dev/nvme0n1
io       6978  iolog: get 0/4096//dev/nvme0n1
io       6978  prep: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=0,file=/dev/nvme0n1
io       6978  queue: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=0,file=/dev/nvme0n1
io       6978  complete: io_u 0x55ef884e5800: off=0x0,len=0x1000,ddir=0,file=/dev/nvme0n1
io       6978  get_io_u failed
io       6978  close ioengine psync
io       6978  free ioengine psync

replay: (groupid=0, jobs=1): err= 0: pid=6978: Tue Feb 14 14:49:21 2023
  read: IOPS=19, BW=78.4KiB/s (80.3kB/s)(4096B/51msec)
    clat (nsec): min=119137, max=119137, avg=119137.00, stdev= 0.00
     lat (nsec): min=121647, max=121647, avg=121647.00, stdev= 0.00
    clat percentiles (usec):
     |  1.00th=[  120],  5.00th=[  120], 10.00th=[  120], 20.00th=[  120],
     | 30.00th=[  120], 40.00th=[  120], 50.00th=[  120], 60.00th=[  120],
     | 70.00th=[  120], 80.00th=[  120], 90.00th=[  120], 95.00th=[  120],
     | 99.00th=[  120], 99.50th=[  120], 99.90th=[  120], 99.95th=[  120],
     | 99.99th=[  120]
  write: IOPS=19, BW=78.4KiB/s (80.3kB/s)(4096B/51msec); 0 zone resets
    clat (nsec): min=123641, max=123641, avg=123641.00, stdev= 0.00
     lat (nsec): min=125896, max=125896, avg=125896.00, stdev= 0.00
    clat percentiles (usec):
     |  1.00th=[  124],  5.00th=[  124], 10.00th=[  124], 20.00th=[  124],
     | 30.00th=[  124], 40.00th=[  124], 50.00th=[  124], 60.00th=[  124],
     | 70.00th=[  124], 80.00th=[  124], 90.00th=[  124], 95.00th=[  124],
     | 99.00th=[  124], 99.50th=[  124], 99.90th=[  124], 99.95th=[  124],
     | 99.99th=[  124]
  trim: IOPS=19, BW=78.4KiB/s (80.3kB/s)(4096B/51msec)
    clat (nsec): min=577432, max=577432, avg=577432.00, stdev= 0.00
     lat (nsec): min=579611, max=579611, avg=579611.00, stdev= 0.00
    clat percentiles (usec):
     |  1.00th=[  578],  5.00th=[  578], 10.00th=[  578], 20.00th=[  578],
     | 30.00th=[  578], 40.00th=[  578], 50.00th=[  578], 60.00th=[  578],
     | 70.00th=[  578], 80.00th=[  578], 90.00th=[  578], 95.00th=[  578],
     | 99.00th=[  578], 99.50th=[  578], 99.90th=[  578], 99.95th=[  578],
     | 99.99th=[  578]
  lat (usec)   : 250=66.67%, 750=33.33%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=8652.1k, max=8652.1k, avg=8652099.00, stdev= 0.00
    sync percentiles (usec):
     |  1.00th=[ 8717],  5.00th=[ 8717], 10.00th=[ 8717], 20.00th=[ 8717],
     | 30.00th=[ 8717], 40.00th=[ 8717], 50.00th=[ 8717], 60.00th=[ 8717],
     | 70.00th=[ 8717], 80.00th=[ 8717], 90.00th=[ 8717], 95.00th=[ 8717],
     | 99.00th=[ 8717], 99.50th=[ 8717], 99.90th=[ 8717], 99.95th=[ 8717],
     | 99.99th=[ 8717]
  cpu          : usr=0.00%, sys=0.00%, ctx=9, majf=0, minf=10
  IO depths    : 1=133.3%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1,1,1,1 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=78.4KiB/s (80.3kB/s), 78.4KiB/s-78.4KiB/s (80.3kB/s-80.3kB/s), io=4096B (4096B), run=51-51msec
  WRITE: bw=78.4KiB/s (80.3kB/s), 78.4KiB/s-78.4KiB/s (80.3kB/s-80.3kB/s), io=4096B (4096B), run=51-51msec
   TRIM: bw=78.4KiB/s (80.3kB/s), 78.4KiB/s-78.4KiB/s (80.3kB/s-80.3kB/s), io=4096B (4096B), run=51-51msec
io       6970  ioengine libaio unregistered
io       6970  ioengine io_uring unregistered
io       6970  ioengine io_uring_cmd unregistered
io       6970  ioengine sg unregistered
io       6970  ioengine mtd unregistered
io       6970  ioengine splice unregistered
io       6970  ioengine e4defrag unregistered
io       6970  ioengine falloc unregistered
io       6970  ioengine posixaio unregistered
io       6970  ioengine exec unregistered
io       6970  ioengine filecreate unregistered
io       6970  ioengine filestat unregistered
io       6970  ioengine filedelete unregistered
io       6970  ioengine ftruncate unregistered
io       6970  ioengine net unregistered
io       6970  ioengine netsplice unregistered
io       6970  ioengine null unregistered
io       6970  ioengine sync unregistered
io       6970  ioengine psync unregistered
io       6970  ioengine vsync unregistered
io       6970  ioengine pvsync unregistered
io       6970  ioengine pvsync2 unregistered
io       6970  ioengine mmap unregistered
io       6970  ioengine cpuio unregistered

mjekdi avatar Feb 14 '23 23:02 mjekdi

FYI, we used a protocol analyzer and determined that the btrace output was correct, so I believe fio is doing something weird here. If we remove the trim command from the iolog then all of fio, btrace, and protocol analyzer are in agreement.

mjekdi avatar Feb 16 '23 18:02 mjekdi

With a workload including trim operations the device is no longer opened with O_DIRECT. So there is no longer a direct correspondence between the requests that fio issues and the requests that the device handles.

int generic_open_file(struct thread_data *td, struct fio_file *f)
{
        int is_std = 0;
        int flags = 0;
        int from_hash = 0;

        dprint(FD_FILE, "fd open %s\n", f->file_name);
...
        if (td_trim(td))
                goto skip_flags;
        if (td->o.odirect)
                flags |= OS_O_DIRECT;
        if (td->o.oatomic) {
                if (!FIO_O_ATOMIC) {
                        td_verror(td, EINVAL, "OS does not support atomic IO");
                        return 1;
                }
                flags |= OS_O_DIRECT | FIO_O_ATOMIC;
        }
        flags |= td->o.sync_io;
        if (td->o.create_on_open && td->o.allow_create)
                flags |= O_CREAT;
skip_flags:
        if (f->filetype != FIO_TYPE_FILE)
                flags |= FIO_O_NOATIME;

@axboe Do you know why we skip these flags for trim workloads? git blame says that it has been this way since the 2012 patch adding trim support to fio.

vincentkfu avatar Feb 17 '23 16:02 vincentkfu

Yeah I don't think we should, there's no good reasons to do that.

axboe avatar Feb 17 '23 18:02 axboe

Let me just kill this O_ATOMIC stuff first, that never ended up upstream.

axboe avatar Feb 17 '23 18:02 axboe

Let me just kill this O_ATOMIC stuff first, that never ended up upstream.

Done

axboe avatar Feb 17 '23 18:02 axboe

Confirmed fix in fio git. Thanks all!!!

# btrace /dev/nvme0n1
259,0    0        1     0.000000000 42267  Q  WS 0 + 8 [fio]
259,0    0        2     0.000002483 42267  G  WS 0 + 8 [fio]
259,0    0        3     0.000003002 42267  P   N [fio]
259,0    0        4     0.000003336 42267 UT   N [fio] 1
259,0    0        5     0.000003660 42267  I  WS 0 + 8 [fio]
259,0    0        6     0.000009480   115  D  WS 0 + 8 [kworker/0:1H]
259,0    0        7     0.008010263     0  C  WS 0 + 8 [0]
259,0    0        8     0.010037622 42267  Q FWS [fio]
259,0    0        9     0.010039172 42267  G FWS [fio]
259,0    0       10     0.010043081   115  D  FN [kworker/0:1H]
259,0    0       11     0.010557907     0  C  FN 0 [0]
259,0    0       12     0.010558495     0  C  WS 0 [0]
259,0    0       13     0.020142784 42267  Q  DS 0 + 8 [fio]
259,0    0       14     0.020143668 42267  G  DS 0 + 8 [fio]
259,0    0       15     0.020143845 42267  P   N [fio]
259,0    0       16     0.020144104 42267 UT   N [fio] 1
259,0    0       17     0.020144238 42267  I  DS 0 + 8 [fio]
259,0    0       18     0.020148373   115  D  DS 0 + 8 [kworker/0:1H]
259,0    0       19     0.020619188     0  C  DS 0 + 8 [0]
259,0    0       20     0.030144175 42267  Q   R 0 + 8 [fio]
259,0    0       21     0.030144931 42267  G   R 0 + 8 [fio]
259,0    0       22     0.030145727 42267  D   R 0 + 8 [fio]
259,0    0       23     0.030191246     0  C   R 0 + 8 [0]

mjekdi avatar Feb 21 '23 22:02 mjekdi