fio icon indicating copy to clipboard operation
fio copied to clipboard

invalidate time is counted towards bandwidth calculation

Open flx42 opened this issue 3 years ago • 3 comments

Please acknowledge the following before creating a ticket

  • [x] I have read the GitHub issues section of REPORTING-BUGS.

Description of the bug: The invalidate flag is documented as follow:

 invalidate=bool

    Invalidate the buffer/page cache parts of the files to be used prior to starting I/O if the platform and file type support it. Defaults to true.

The "prior to starting I/O" seems to imply that the page cache would be dropped before fio starts measuring I/O performance, but it doesn't seem to be the case.

Environment: Ubuntu 22.04 LTS with Linux kernel 5.15.0-40-generic This is on a dual socket AMD CPU with 4 NUMA nodes per socket and 2 TiB of RAM.

fio version:

$ fio --version
fio-3.30

Reproduction steps Consider the following fio job file doing performing buffered I/O on a mountpoint that is a ext4 RAID 0 of 8 NVMe SSDs (PCIe Gen 4):

[global]
group_reporting=1
disable_lat=1
disable_clat=1
disable_slat=1
create_serialize=0

size=48G
directory=/raid/scratch/
blocksize=1M

[socket0]
numa_cpu_nodes=0-3
numa_mem_policy=interleave:0-3
numjobs=20

[socket1]
numa_cpu_nodes=4-7
numa_mem_policy=interleave:4-7
numjobs=20

The goal is to check the performance of buffered I/O including the efficiency of the page cache. So we can use the invalidate flag to have fio drop the cache before the run.

If all the files are already present in the page cache, we would have expected the two sequences of commands below to show the same performance, but they do not.

  • Manually drops all the page cache, but requires elevated privileges
$ sudo sysctl vm.drop_caches=3

$ fio raid.fio

Run status group 0 (all jobs):
   READ: bw=40.0GiB/s (43.0GB/s), 40.0GiB/s-40.0GiB/s (43.0GB/s-43.0GB/s), io=1920GiB (2062GB), run=47946-47946msec
  • Selectively drops the page cache for the files handled by fio, does not require elevated privileges
$ fio --invalidate=1 raid.fio

Run status group 0 (all jobs):
   READ: bw=22.7GiB/s (24.4GB/s), 22.7GiB/s-22.7GiB/s (24.4GB/s-24.4GB/s), io=1920GiB (2062GB), run=84578-84578msec

When running with --invalidate=1, we can verify that at the beginning of the job there is no disk I/O, but later there is ~42GB/s of I/O:

 $ iostat -d -h -p dm-1 1

[...]

      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
 42320.00        41.3G         0.0k         0.0k      41.3G       0.0k       0.0k dm-1

Is that the expected/intended behavior for the invalidate flag? Or is that a bug?

flx42 avatar Jul 15 '22 21:07 flx42

The default value for invalidate is 1. So more investigation needs to happen to understand what is going on. Consider running a small job with --debug=io,file to gain insight into what fio is doing.

vincentkfu avatar Jul 22 '22 14:07 vincentkfu

Yes I know it's the default, I explicitly set --invalidate=1 in the example above to show that it's the problem and in case fio changes/changed the default behavior across releases. So with the example above, fio --invalidate=1 raid.fio and fio raid.fio behave the same when the fio files are resident in the page cache.

flx42 avatar Jul 22 '22 18:07 flx42

The fio debug log doesn't seem very helpful, particularly since it doesn't include timestamps:

file     224907 dup files: 0
io       224907 load ioengine psync
file     224907 add file socket0.0.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461b110 "/raid/scratch//socket0.0.0" added at 0
socket0: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket0.19.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461b2f0 "/raid/scratch//socket0.19.0" added at 0
file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket0.18.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461b4d0 "/raid/scratch//socket0.18.0" added at 0

[...]

file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket1.1.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461fa30 "/raid/scratch//socket1.1.0" added at 0
...
fio-3.28
Starting 40 processes
file     225202 setup files
file     225202 get file size for 0x7ffff461f670/0//raid/scratch//socket1.3.0
file     225202 trying file /raid/scratch//socket1.3.0 210
file     225202 fd open /raid/scratch//socket1.3.0
file     225202 file not found in hash /raid/scratch//socket1.3.0
file     225202 get file /raid/scratch//socket1.3.0, ref=0
io       225202 declare unneeded cache /raid/scratch//socket1.3.0: 0/51539607552
file     225202 goodf=1, badf=2, ff=231
file     225202 get_next_file_rr: 0x7ffff461f670
file     225202 get_next_file: 0x7ffff461f670 [/raid/scratch//socket1.3.0]
file     225202 get file /raid/scratch//socket1.3.0, ref=1
io       225202 fill: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
io       225202 prep: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
io       225202 queue: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
io       225202 complete: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
file     225202 put file /raid/scratch//socket1.3.0, ref=2
file     225202 trying file /raid/scratch//socket1.3.0 231

[...]

file     225184 trying file /raid/scratch//socket0.1.0 231
file     225184 goodf=1, badf=2, ff=231
file     225184 get_next_file_rr: 0x7ffff461d4b0
file     225184 get_next_file: 0x7ffff461d4b0 [/raid/scratch//socket0.1.0]
file     225184 get file /raid/scratch//socket0.1.0, ref=1
io       225184 fill: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io       225184 prep: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io       225184 queue: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io       225184 complete: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
file     225184 put file /raid/scratch//socket0.1.0, ref=2
file     225184 trying file /raid/scratch//socket0.1.0 231
file     225184 goodf=1, badf=2, ff=231
file     225184 get_next_file_rr: 0x7ffff461d4b0
file     225184 get_next_file: 0x7ffff461d4b0 [/raid/scratch//socket0.1.0]
file     225184 get file /raid/scratch//socket0.1.0, ref=1
io       225184 fill: io_u 0x5555557ed300: off=0xbfff00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io       225184 prep: io_u 0x5555557ed300: off=0xbfff00000,len=0x100000,ddir=0,file=/raid/scratch//so  224907 resize file array to 2 files
file     224907 file 0x7ffff461f2b0 "/raid/scratch//socket1.5.0" added at 0
file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket1.4.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461f490 "/raid/scratch//socket1.4.0" added at 0
file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket1.3.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461f670 "/raid/scratch//socket1.3.0" added at 0
file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket1.2.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461f850 "/raid/scratch//socket1.2.0" added at 0
file     224907 dup files: 1
io       224907 load ioengine psync
file     224907 add file socket1.1.0
file     224907 resize file array to 2 files
file     224907 file 0x7ffff461fa30 "/raid/scratch//socket1.1.0" added at 0
...
fio-3.28
Starting 40 processes

socket0: (groupid=0, jobs=40): err= 0: pid=225165: Fri Jul 22 11:51:12 2022
  read: IOPS=24.5k, BW=24.0GiB/s (25.7GB/s)(1920GiB/80126msec)
   bw (  MiB/s): min=22924, max=46762, per=100.00%, avg=43358.39, stdev=61.16, samples=3589
   iops        : min=22924, max=46762, avg=43358.39, stdev=61.16, samples=3589
  cpu          : usr=0.49%, sys=99.42%, ctx=14339, majf=0, minf=14755
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1966080,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=24.0GiB/s (25.7GB/s), 24.0GiB/s-24.0GiB/s (25.7GB/s-25.7GB/s), io=1920GiB (2062GB), run=80126-80126msec

Disk stats (read/write):
    dm-1: ios=1966046/0, merge=0/0, ticks=1666548/0, in_queue=1666548, util=56.81%, aggrios=15728640/0, aggrmerge=0/0, aggrticks=10172112/0, aggrin_queue=10172112, aggrutil=56.81%
    md127: ios=15728640/0, merge=0/0, ticks=10172112/0, in_queue=10172112, util=56.81%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
  nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme6n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme9n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme5n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme8n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme4n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  nvme7n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

flx42 avatar Jul 22 '22 19:07 flx42