invalidate time is counted towards bandwidth calculation
Please acknowledge the following before creating a ticket
- [x] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
The invalidate flag is documented as follow:
invalidate=bool
Invalidate the buffer/page cache parts of the files to be used prior to starting I/O if the platform and file type support it. Defaults to true.
The "prior to starting I/O" seems to imply that the page cache would be dropped before fio starts measuring I/O performance, but it doesn't seem to be the case.
Environment: Ubuntu 22.04 LTS with Linux kernel 5.15.0-40-generic This is on a dual socket AMD CPU with 4 NUMA nodes per socket and 2 TiB of RAM.
fio version:
$ fio --version
fio-3.30
Reproduction steps Consider the following fio job file doing performing buffered I/O on a mountpoint that is a ext4 RAID 0 of 8 NVMe SSDs (PCIe Gen 4):
[global]
group_reporting=1
disable_lat=1
disable_clat=1
disable_slat=1
create_serialize=0
size=48G
directory=/raid/scratch/
blocksize=1M
[socket0]
numa_cpu_nodes=0-3
numa_mem_policy=interleave:0-3
numjobs=20
[socket1]
numa_cpu_nodes=4-7
numa_mem_policy=interleave:4-7
numjobs=20
The goal is to check the performance of buffered I/O including the efficiency of the page cache. So we can use the invalidate flag to have fio drop the cache before the run.
If all the files are already present in the page cache, we would have expected the two sequences of commands below to show the same performance, but they do not.
- Manually drops all the page cache, but requires elevated privileges
$ sudo sysctl vm.drop_caches=3
$ fio raid.fio
Run status group 0 (all jobs):
READ: bw=40.0GiB/s (43.0GB/s), 40.0GiB/s-40.0GiB/s (43.0GB/s-43.0GB/s), io=1920GiB (2062GB), run=47946-47946msec
- Selectively drops the page cache for the files handled by fio, does not require elevated privileges
$ fio --invalidate=1 raid.fio
Run status group 0 (all jobs):
READ: bw=22.7GiB/s (24.4GB/s), 22.7GiB/s-22.7GiB/s (24.4GB/s-24.4GB/s), io=1920GiB (2062GB), run=84578-84578msec
When running with --invalidate=1, we can verify that at the beginning of the job there is no disk I/O, but later there is ~42GB/s of I/O:
$ iostat -d -h -p dm-1 1
[...]
tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device
42320.00 41.3G 0.0k 0.0k 41.3G 0.0k 0.0k dm-1
Is that the expected/intended behavior for the invalidate flag? Or is that a bug?
The default value for invalidate is 1. So more investigation needs to happen to understand what is going on. Consider running a small job with --debug=io,file to gain insight into what fio is doing.
Yes I know it's the default, I explicitly set --invalidate=1 in the example above to show that it's the problem and in case fio changes/changed the default behavior across releases. So with the example above, fio --invalidate=1 raid.fio and fio raid.fio behave the same when the fio files are resident in the page cache.
The fio debug log doesn't seem very helpful, particularly since it doesn't include timestamps:
file 224907 dup files: 0
io 224907 load ioengine psync
file 224907 add file socket0.0.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461b110 "/raid/scratch//socket0.0.0" added at 0
socket0: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket0.19.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461b2f0 "/raid/scratch//socket0.19.0" added at 0
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket0.18.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461b4d0 "/raid/scratch//socket0.18.0" added at 0
[...]
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket1.1.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461fa30 "/raid/scratch//socket1.1.0" added at 0
...
fio-3.28
Starting 40 processes
file 225202 setup files
file 225202 get file size for 0x7ffff461f670/0//raid/scratch//socket1.3.0
file 225202 trying file /raid/scratch//socket1.3.0 210
file 225202 fd open /raid/scratch//socket1.3.0
file 225202 file not found in hash /raid/scratch//socket1.3.0
file 225202 get file /raid/scratch//socket1.3.0, ref=0
io 225202 declare unneeded cache /raid/scratch//socket1.3.0: 0/51539607552
file 225202 goodf=1, badf=2, ff=231
file 225202 get_next_file_rr: 0x7ffff461f670
file 225202 get_next_file: 0x7ffff461f670 [/raid/scratch//socket1.3.0]
file 225202 get file /raid/scratch//socket1.3.0, ref=1
io 225202 fill: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
io 225202 prep: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
io 225202 queue: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
io 225202 complete: io_u 0x5555557ed300: off=0x0,len=0x100000,ddir=0,file=/raid/scratch//socket1.3.0
file 225202 put file /raid/scratch//socket1.3.0, ref=2
file 225202 trying file /raid/scratch//socket1.3.0 231
[...]
file 225184 trying file /raid/scratch//socket0.1.0 231
file 225184 goodf=1, badf=2, ff=231
file 225184 get_next_file_rr: 0x7ffff461d4b0
file 225184 get_next_file: 0x7ffff461d4b0 [/raid/scratch//socket0.1.0]
file 225184 get file /raid/scratch//socket0.1.0, ref=1
io 225184 fill: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io 225184 prep: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io 225184 queue: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io 225184 complete: io_u 0x5555557ed300: off=0xbffe00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
file 225184 put file /raid/scratch//socket0.1.0, ref=2
file 225184 trying file /raid/scratch//socket0.1.0 231
file 225184 goodf=1, badf=2, ff=231
file 225184 get_next_file_rr: 0x7ffff461d4b0
file 225184 get_next_file: 0x7ffff461d4b0 [/raid/scratch//socket0.1.0]
file 225184 get file /raid/scratch//socket0.1.0, ref=1
io 225184 fill: io_u 0x5555557ed300: off=0xbfff00000,len=0x100000,ddir=0,file=/raid/scratch//socket0.1.0
io 225184 prep: io_u 0x5555557ed300: off=0xbfff00000,len=0x100000,ddir=0,file=/raid/scratch//so 224907 resize file array to 2 files
file 224907 file 0x7ffff461f2b0 "/raid/scratch//socket1.5.0" added at 0
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket1.4.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461f490 "/raid/scratch//socket1.4.0" added at 0
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket1.3.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461f670 "/raid/scratch//socket1.3.0" added at 0
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket1.2.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461f850 "/raid/scratch//socket1.2.0" added at 0
file 224907 dup files: 1
io 224907 load ioengine psync
file 224907 add file socket1.1.0
file 224907 resize file array to 2 files
file 224907 file 0x7ffff461fa30 "/raid/scratch//socket1.1.0" added at 0
...
fio-3.28
Starting 40 processes
socket0: (groupid=0, jobs=40): err= 0: pid=225165: Fri Jul 22 11:51:12 2022
read: IOPS=24.5k, BW=24.0GiB/s (25.7GB/s)(1920GiB/80126msec)
bw ( MiB/s): min=22924, max=46762, per=100.00%, avg=43358.39, stdev=61.16, samples=3589
iops : min=22924, max=46762, avg=43358.39, stdev=61.16, samples=3589
cpu : usr=0.49%, sys=99.42%, ctx=14339, majf=0, minf=14755
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1966080,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=24.0GiB/s (25.7GB/s), 24.0GiB/s-24.0GiB/s (25.7GB/s-25.7GB/s), io=1920GiB (2062GB), run=80126-80126msec
Disk stats (read/write):
dm-1: ios=1966046/0, merge=0/0, ticks=1666548/0, in_queue=1666548, util=56.81%, aggrios=15728640/0, aggrmerge=0/0, aggrticks=10172112/0, aggrin_queue=10172112, aggrutil=56.81%
md127: ios=15728640/0, merge=0/0, ticks=10172112/0, in_queue=10172112, util=56.81%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme6n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme9n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme5n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme8n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme4n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
nvme7n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%