fio: Random Write Block Size doesn't seem to be correct - Linux - Debian
Please acknowledge the following before creating a ticket
- [X] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
FIO random write block size with libaio doesn't seem to be correct (it's half of the test params)
Environment: Debian 11.5
fio version: fio-3.25 and fio-3.32
Reproduction steps
I'm testing a Samsung MZILT3T8HBLS/007 which is rated at 50,000 random 4k write IOPs. When I run a test with fio, such as:
fio --name=/dev/sda --ioengine=libaio --direct=1 --fsync=1 --readwrite=randwrite --blocksize=4k --runtime=300 --iodepth=32
I get ~25K iops:
Jobs: 1 (f=1): [w(1)][67.3%][w=97.4MiB/s][w=24.9k IOPS][eta 01m:38s]
But when I watch the test running with iostat (iostat -x -p /dev/sda 1), I see that the write req size is 2k, and the write IOPs are ~50K:
avg-cpu: %user %nice %system %iowait %steal %idle
0.30 0.00 0.52 0.86 0.00 98.32
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 50604.00 101208.00 0.00 0.00 0.03 2.00 0.00 0.00 0.00 0.00 0.00 0.00 25302.00 0.03 2.25 100.00
Is this expected behavior? Am I misconfiguring the test?
Why are you running with --fsync=1?
@vincentkfu i am most interested in testing synchronous writes. This is also the recommended test here:
https://docs.ceph.com/en/quincy/start/hardware-recommendations/
I would do sync=1, but that doesn’t seem to have an affect on libaio. If I run the test with the sync or psync engines and sync=1, I get the same write size behavior.
@vincentkfu would you happen to have any insight into this? Am I doing something incorrect/unexpected as far as fio is concerned?
@noahmehl try to turn caching off on the device (/sys/class/scsi_disk/device/cache_type = 'write through'): hdparm -W0 /dev/sda
I guess it's not bound to distro, as a wareq-sz = 2 when caching is on at least in Debian, in Ubuntu, and in Fedora. I didn't check others though.
@ooptimum when I check with hdparm, I get unsupported:
# hdparm -W /dev/sda
/dev/sda:
write-caching = not supported
@ooptimum actually, I was able to do this with sdparm:
# sdparm -c WCE /dev/sda
/dev/sda: SAMSUNG MZILT3T8HBLS/007 GXA0
# sdparm -g WCE /dev/sda
/dev/sda: SAMSUNG MZILT3T8HBLS/007 GXA0
WCE 0 [cha: y, def: 1, sav: 1]
However, that didn't have any effect to the original issue.
@ooptimum you were correct. I had to:
# echo 'write through' > /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/host0/port-0:0/expander-0:0/port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/cache_type
This increased the random 4k write performance to ~150K IOPs. And now fio and iostat match up 1:1 for IOPs and wareq-sz.
@ooptimum can you help me with a source to understand why write back causes a wareq-sz of 2K?
@noahmehl Sorry for the delay in replying, I have had a very busy time. I wish I could explain why caching affects block size, but unfortunately I can't.