fio Verification failures writing to small range with bs

Seem like there's an issue when issuing IOs to a relatively small range when verifying headers.

Using fio version:

[ec2-user@ip-172-31-30-241 ~]$ /usr/local/bin/fio --version
fio-3.23

The following job file:

[test]
direct=0
ioengine=libaio
#do_verify=1
verify=crc32c
verify_fatal=1
verify_dump=1
bwavgtime=1000
iopsavgtime=1000
rw=randrw
iodepth=2
bs_unaligned=1
bsrange=512b-1k
time_based=1
size=10m
verify_backlog=10
runtime=6h
serialize_overlap=1
filename=/test/barak.test

Debug log:

verify   19888 fill crc32c io_u 0x2631780, len 854
io       19888 queue: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=1,file=/test/barak.test
io       19888 calling ->commit(), depth 2
io       19888 io_u_queued_complete: min=1
io       19888 getevents: 1
io       19888 complete: io_u 0x26325c0: off=0x31b200,len=0x200,ddir=0,file=/test/barak.test

io       19888 fill: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test
io       19888 prep: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test
io       19888 prep: io_u 0x26325c0: ret=0
verify   19888 fill random bytes len=579
verify   19888 fill crc32c io_u 0x26325c0, len 579
io       19888 queue: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test
io       19888 complete: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test


io       19888 prep: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=0,file=/test/barak.test
io       19888 prep: io_u 0x2631780: ret=0
io       19888 queue: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=0,file=/test/barak.test
io       19888 complete: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=0,file=/test/barak.test

verify: bad header length 579, wanted 854 at file /dev/nvme6n1 offset 10221056, length 854 (requested block: offset=10221056, length=854)
       hdr_fail data dumped as nvme6n1.10221056.hdr_fail

As can be seen in debug log fio is issuing two unaligned writes, one with len 854 and the other with len 579, to the same offset (0x9bf600)

This means that header len on disk is populated with recent write buflen which is 579, due to the following code with td->o.bs_unaligned=1:

static unsigned int get_hdr_inc(struct thread_data *td, struct io_u *io_u)
{
	unsigned int hdr_inc;

	/*
	 * If we use bs_unaligned, buflen can be larger than the verify
	 * interval (which just defaults to the smallest blocksize possible).
	 */
	hdr_inc = io_u->buflen;
	if (td->o.verify_interval && td->o.verify_interval <= io_u->buflen &&
	    !td->o.bs_unaligned)
		hdr_inc = td->o.verify_interval;

	return hdr_inc;
}

When bs_unaligned=1 header len is set to submitted buflen. However on overwrite to the same block buflen may differ. This problem does not exist with bs_unaligned=0 as header len will be fixed to the verify_interval the user sets in config.

I believe the straightforward way to fix this will be not to verify header when bs_unaligned=1. Please lmk if I'm missing something (e.g. my jobfile isn't valid). I've verified the same workload works when bs_unaligned isn't set.

Sep 25 '20 23:09 barakp

I see you're doing a verify job with rw=randrw in the above - does this issue happen even without that (to rule out this being another case of #792 )?

I'm skeptical that skipping the header verification is the right thing to do. fio needs to know when it has "damaged" a previous block (or blocks...) when a write is done and remove it (or them) from what can be verified and put the new block on the list of things that are verifiable. When things aren't aligned there's potential for things to get complicated as you can't depend on the start at least being aligned to a previously written block and have to do more complicated partial overlap checking.

Sep 26 '20 07:09 sitsofe

Yes, it happens also with randwrite. Perhaps in in_flight_overlap don't only verify in-flight IOs but also IOs that are pending verification?

Sep 26 '20 08:09 barakp

From a quick glance in_flight_overlap() should be serializing the verify I/O correctly against the regular I/O because both types have to go through io_u_submit() (and the I/O is being submitted from the same thread). I think the problem stems from what @bardavid described in https://github.com/axboe/fio/issues/792#issuecomment-687662239 :

when fio uses randommap then it will append each io_piece to a linked list, not caring for overwrites of unverified ios. get_next_verify could get an obsolete io_piece (offset already overwritten).

The second I/O needed to tell fio to drop verification of the (potentially) bigger first I/O from the "verifiable list" because it has now been damaged but it fails to do so when a randommap is being used (see https://github.com/axboe/fio/blob/7064f8942a3b8070acf60b8e5fabc16f8221ae40/iolog.c#L251 ). To check if we need to remove "damaged" pieces from an unsorted linked list of pieces would require searching the whole list for potential overlaps which is going to be O(N) in the worst case (which will likely be the typical case!) so @bardavid's suggestion to abandon the linked list and always use the rbtree seems sensible. If we don't want to do that we have to add more exceptions to when the linked list can be used (you seem happy digging into the code so I'd start here to verify (no pun intended) we're on the right track).

I think the origin of this is that when verify was added to fio it assumed you only do one "pass" of writes before any of the I/O was verified. If you have a randommap in use this means you can't damage a block before it is verified because we are guaranteed that there are no overlaps. In modern fio this is no longer the case because I/O can "loop around" on the second or later passes (because randommap applies per pass) and damage blocks before verification has had a chance to run.

Sep 26 '20 12:09 sitsofe

(if after doing a quick modification like:

diff --git a/iolog.c b/iolog.c
index fa40c857..9e67d0bf 100644
--- a/iolog.c
+++ b/iolog.c
@@ -248,7 +248,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
         * to check for duplicate blocks and drop the old one, which we rely on
         * the rb insert/lookup for handling.
         */
-       if (file_randommap(td, ipo->file)) {
+       if (file_randommap(td, ipo->file) && false) {
                INIT_FLIST_HEAD(&ipo->list);
                flist_add_tail(&ipo->list, &td->io_hist_list);
                ipo->flags |= IP_F_ONLIST;

your job runs without error you can use --debug=io and then pipe the output to grep 'iolog: overlap' and see if the rbtree overlap detection is being triggered)

Sep 26 '20 12:09 sitsofe

Indeed simply by forcing rbtree overlap check seems to do the trick. Job didn't fail after applying the above diff.

io       8668  iolog: overlap 8178176/11264, 8167936/67809
io       8668  iolog: overlap 8189440/15971, 8167936/67809
io       8668  iolog: overlap 8205824/16732, 8167936/67809
io       8668  iolog: overlap 8222720/1853, 8167936/67809
io       8668  iolog: overlap 8224768/3892, 8167936/67809
io       8668  iolog: overlap 8228864/13824, 8167936/67809
io       8668  iolog: overlap 8073216/96695, 8167936/67809
io       8668  iolog: overlap 6684672/100160, 6694912/79101
io       8668  iolog: overlap 5807104/53760, 5809664/11433
io       8668  iolog: overlap 5198336/40619, 5204480/124779
io       8668  iolog: overlap 5239296/63488, 5204480/124779
io       8668  iolog: overlap 5302784/123963, 5204480/124779
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619
io       8668  in-flight overlap: 5204480/124779, 5198336/40619

btw - also running with norandommap solves this. I wonder if we can just change fio doc to be clear about using verification without norandommap in case there's a range overlap can cause verification errors.

Sep 26 '20 14:09 barakp

We should fix this as you've found a legitimate bug. I think there are few subtleties:

You were using a depth greater than 1 and knew that you had to use serialize_overlap=1. It's not clear how you knew you had to use that option but you were correct to do so! What happens to all the poor souls that don't know this is essential if you have a configuration that may issue multiple in flight reads or writes to overlapping blocks? Maybe verification that may run in parallel with writes (so verify_backlog and/or verifying when using rw=randrw) should just force it on?
When the above is done, many don't see another problem because all their blocks are the same size. The region damaged is an exactly sized overwrite so unless you were verifying by "content" (as opposed to checking the header of what is now on the disk) fio will never notice...

Here's another job that quickly generates a spurious failure:

[verify_test_2]
randseed=1
ioengine=posixaio
verify=pattern
verify_pattern=%o
verify_fatal=1
verify_backlog=13
rw=randrw
iodepth=1
bssplit=512b/50:64k/50
runtime=30s
time_based=1
size=1m
serialize_overlap=1

Sep 26 '20 18:09 sitsofe

@sitsofe in reference to my comment on https://github.com/axboe/fio/issues/792#issuecomment-699542167 , as a temporary workaround i agree serialize_overlap and rbtree usage can be forced on specific configurations

Sep 26 '20 20:09 bardavid

fio
fio copied to clipboard

Verification failures writing to small range with bs_unaligned=1

fio fio copied to clipboard

Verification failures writing to small range with bs_unaligned=1

fio
fio copied to clipboard