fio
fio copied to clipboard
Skip offline zones
Please acknowledge the following before creating a ticket
- [x] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug: Running FIO on a ZNS drive with offline zones fail at z->wp <= z->start + zone_size as the pointer to offline zones have random wp values
Environment: Tried on multiple linux OS ubuntu 22.04 is one of it
fio version: 3.33
Reproduction steps Run FIO on a ZNS drive with offline zones at the end
FIO Params: time_based=1 max_latency=20000000 initial_zone_reset=0 thread=1 zonemode=zbd readwrite=randrw blocksize=32768 numjobs=1 direct=1 read_beyond_wp=0 percentage_random=0 rwmixread=100 ramp_time=5 max_open_zones=16 offset=0 ioengine=FioSpdk runtime=120 iodepth=1
CC @damien-lemoal
Does this reproduce without fiospdk? This isn't an engine we ship with fio, hence it's not a supported configuration.
Engine used here is from spdk: ~/Documents/spdk/build/fio/spdk_nvme.
I know where it's from, the point is that we cannot help debug issues with external engines that aren't part of the fio repository. So please ensure that the issue exists with an engine that ships with fio.
I'm trying to use libzbc but FIO is not taking it. I'll keep trying. Please let me know which other you'd prefer for zns, as not all works with it.
libzbc is for SMR HDDs, SCSI or ATA drives. That will not help you with ZNS. Your kernel on Ubunut should support ZNS out of the box so simply try with libaio or iouring.
cc @kawasaki
Shin'ichiro,
Can you have a look ? We may be missing some checks in zbd.c to skip offline zones. We may also need to check read-only zones for write operations.
@SRK4ever I tweaked null_blk to have offline zones. I have tried to reproduce the failure with this tweaked null_blk, but I saw no failure. Could you share the failure message? I wonder if I/O error happened or any other zone status handling error happened.
The failure happens only when the wp is purely random and doesn't get assigned a value for offline zones. In my failure case, the wp value was greater than the z_start + zone_size. It jumped from 1b738000000 to FFFFFFFFFFFFF000 when the offline transition happened.
Are you saying that the problem happened when a zone transitioned to offline state DURING your fio run ? If yes, then that is normal to see IO errors: fio does a report zones on startup only (which will detect and skip offline zones). A report zone is not done before every IO to check that the zone is OK. That would be a crazy overhead and make any testing useless. fio does not do error recovery by doing a report zones again when an IO error happens. That is by design.
Please confirm the exact condition of the problem trigger. If it triggers with an offline zone already present when you start fio, then it is a bug that we can fix (but Shin'ichiro checked that and everything seems OK). If the problem is due to an offline transition while fio is running, then getting the IO errors is normal and expected.
It happens at the start of the test, when the drive already has offline zones. The FIO doesn't even start any IO at the point of failure, happens during the pre-check, I guess.
@SRK4ever Could you share the failure message?