fio icon indicating copy to clipboard operation
fio copied to clipboard

Skip offline zones

Open SRK4ever opened this issue 3 years ago • 12 comments

Please acknowledge the following before creating a ticket

  • [x] I have read the GitHub issues section of REPORTING-BUGS.

Description of the bug: Running FIO on a ZNS drive with offline zones fail at z->wp <= z->start + zone_size as the pointer to offline zones have random wp values

Environment: Tried on multiple linux OS ubuntu 22.04 is one of it

fio version: 3.33

Reproduction steps Run FIO on a ZNS drive with offline zones at the end

FIO Params: time_based=1 max_latency=20000000 initial_zone_reset=0 thread=1 zonemode=zbd readwrite=randrw blocksize=32768 numjobs=1 direct=1 read_beyond_wp=0 percentage_random=0 rwmixread=100 ramp_time=5 max_open_zones=16 offset=0 ioengine=FioSpdk runtime=120 iodepth=1

SRK4ever avatar Nov 18 '22 04:11 SRK4ever

CC @damien-lemoal

axboe avatar Nov 18 '22 14:11 axboe

Does this reproduce without fiospdk? This isn't an engine we ship with fio, hence it's not a supported configuration.

axboe avatar Nov 18 '22 14:11 axboe

Engine used here is from spdk: ~/Documents/spdk/build/fio/spdk_nvme.

SRK4ever avatar Nov 18 '22 18:11 SRK4ever

I know where it's from, the point is that we cannot help debug issues with external engines that aren't part of the fio repository. So please ensure that the issue exists with an engine that ships with fio.

axboe avatar Nov 18 '22 18:11 axboe

I'm trying to use libzbc but FIO is not taking it. I'll keep trying. Please let me know which other you'd prefer for zns, as not all works with it.

SRK4ever avatar Nov 18 '22 21:11 SRK4ever

libzbc is for SMR HDDs, SCSI or ATA drives. That will not help you with ZNS. Your kernel on Ubunut should support ZNS out of the box so simply try with libaio or iouring.

damien-lemoal avatar Nov 19 '22 00:11 damien-lemoal

cc @kawasaki

Shin'ichiro,

Can you have a look ? We may be missing some checks in zbd.c to skip offline zones. We may also need to check read-only zones for write operations.

damien-lemoal avatar Nov 19 '22 00:11 damien-lemoal

@SRK4ever I tweaked null_blk to have offline zones. I have tried to reproduce the failure with this tweaked null_blk, but I saw no failure. Could you share the failure message? I wonder if I/O error happened or any other zone status handling error happened.

kawasaki avatar Nov 25 '22 05:11 kawasaki

The failure happens only when the wp is purely random and doesn't get assigned a value for offline zones. In my failure case, the wp value was greater than the z_start + zone_size. It jumped from 1b738000000 to FFFFFFFFFFFFF000 when the offline transition happened.

SRK4ever avatar Nov 28 '22 18:11 SRK4ever

Are you saying that the problem happened when a zone transitioned to offline state DURING your fio run ? If yes, then that is normal to see IO errors: fio does a report zones on startup only (which will detect and skip offline zones). A report zone is not done before every IO to check that the zone is OK. That would be a crazy overhead and make any testing useless. fio does not do error recovery by doing a report zones again when an IO error happens. That is by design.

Please confirm the exact condition of the problem trigger. If it triggers with an offline zone already present when you start fio, then it is a bug that we can fix (but Shin'ichiro checked that and everything seems OK). If the problem is due to an offline transition while fio is running, then getting the IO errors is normal and expected.

damien-lemoal avatar Nov 28 '22 23:11 damien-lemoal

It happens at the start of the test, when the drive already has offline zones. The FIO doesn't even start any IO at the point of failure, happens during the pre-check, I guess.

SRK4ever avatar Nov 30 '22 22:11 SRK4ever

@SRK4ever Could you share the failure message?

kawasaki avatar Dec 01 '22 02:12 kawasaki