FIO Reports Wrong Offset and Illegal Address Access During Pattern Verification
Please acknowledge the following before creating a ticket
- [x] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
When using FIO for sequential read verification, I found a logical error in FIO's calculation of the offset for verification failures. The specific issue is as follows:
- When using
verify=pattern, FIO reports an error and dumps the incorrect data location if a mismatch is detected. - The actual data error location is at offset=1048576, and the IO that triggers the read verification failure is at offset=1040896 with len=16384. However, FIO reports an incorrect error location and encounters an illegal address access when dumping data. The dump information is as follows:
devsdb: (g=0): rw=read, bs=(R) 512B-16.0KiB, (W) 512B-16.0KiB, (T) 512B-16.0KiB, ioengine=libaio, iodepth=100
fio-3.1
Starting 1 thread
fio: got pattern '31', wanted '32'. Bad bits 2
fio: bad pattern block offset 0
pattern: verify failed at file /dev/sdb offset 3790717919, length 825307441
received data dumped as sdb.1286656.received
write verify buf file: Bad address
expected data dumped as sdb.1286656.expected
fio: verify type mismatch (12593 media, 18 given)
fio: pid=6071, err=84/file:io_u.c:2030, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character
- According to FIO's IO trace information,
io_offset=1040896andio_len=16384. FIO calculates the error offset using the formula:
io_offset + hdr_num * io_buf_size = 1040896 + 15 * 16384 = 1286656
However, the actual error location should be:
io_offset + hdr_num * verify_interval = 1040896 + 15 * 512 = 1048576
- This indicates that FIO uses an incorrect formula to calculate the error offset, resulting in a mismatch between the reported offset and the actual error location.
Additional Testing and Code Review:
- I have tested this issue on both FIO 3.1 and FIO 3.38, and the problem exists in both versions.
- I also reviewed the source code of FIO 3.9, and it appears that this issue has not been resolved in that version either.
Environment:
- Operating System: BigCloud Enterprise Linux 8
- Kernel Version: GNU/Linux 4.19.0-240.23.11.el8_2.bclinux.x86_64
fio version: 3.38
Reproduction steps:
-
Generate a random file:
dd if=/dev/zero bs=1 count=512 | tr -c \0 2 | dd of=./test_just_for_fio_dcrtf bs=1 count=512 -
Perform sequential write using FIO:
fio -name=global \ -ioengine=libaio \ -thread=1 \ -direct=1 \ -rw=write \ -serialize_overlap=1 \ -bssplit=256k \ -size=10485760 \ -offset=0 \ -offset_increment=21474836480 \ -verify=pattern \ -verify_pattern=\'./test_just_for_fio_dcrtf\' \ -numjobs=1 \ -exitall_on_error=1 \ -do_verify=0 \ -iodepth=128 \ -group_reporting=1 \ -ramp_time=0 \ -name=devsdb -filename=/dev/sdb -write_iolog=1st_devsdb.log -
Modify data at offset 1048576:
dd if=/dev/zero bs=1 count=512 | tr -c \0 1 | dd of=/dev/sdb bs=1 count=512 seek=1048576 dd if=/dev/sdb bs=1 count=512 skip=1048576 status=none | hexdump -C -
Perform sequential read verification using FIO:
fio -ioengine=libaio \ -thread=1 \ -direct=1 \ -rw=read \ -serialize_overlap=1 \ -bssplit=16384:512 \ -size=104857600 \ -offset=0 \ -offset_increment=21474836480 \ -verify=pattern \ -verify_pattern=\'test_just_for_fio_dcrtf\' \ -numjobs=1 \ -iodepth=100 \ -group_reporting=1 \ -verify_dump=1 \ -verify_backlog=100 \ -exitall_on_error=1 \ -verify_fatal=1 \ -name=devsdb -filename=/dev/sdb -write_iolog=2nd_devsdb.log -
Observe FIO error output:
fio: got pattern '31', wanted '32'. Bad bits 2 fio: bad pattern block offset 0 pattern: verify failed at file /dev/sdb offset 3790717919, length 825307441 received data dumped as sdb.1286656.received write verify buf file: Bad address expected data dumped as sdb.1286656.expected fio: verify type mismatch (12593 media, 18 given) fio: pid=6071, err=84/file:io_u.c:2030, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character
Expected behavior:
FIO should correctly calculate the error offset using the formula:
io_offset + hdr_num * verify_interval
instead of:
io_offset + hdr_num * io_buf_size
Additional Information:
- This issue may prevent users from accurately locating data errors, affecting debugging and troubleshooting.
- It is recommended to fix the offset calculation logic in the
verifyfeature.
Proposed Fix:
The calculation of the error offset should be updated to use verify_interval instead of io_buf_size in the relevant part of the FIO source code.
Hello @AshCrismon,
I think the crux of the problem is that fio has always assumed that if you want to a verify of a previous job it must be done with the same parameters as the job that wrote the file in the first place. Doing a quick diff against a lightly reformatted version of your jobfiles to reduce the number of changes shows this:
% diff -u 1 2
--- 1 2025-02-23 07:50:04
+++ 2 2025-02-23 07:50:48
@@ -1,21 +1,22 @@
-fio -name=global \
+fio \
-ioengine=libaio \
-thread=1 \
-direct=1 \
- -rw=write \
+ -rw=read \
-serialize_overlap=1 \
- -bssplit=256k \
- -size=10485760 \
+ -bssplit=16384:512 \
+ -size=104857600 \
-offset=0 \
-offset_increment=21474836480 \
-verify=pattern \
- -verify_pattern=\'./test_just_for_fio_dcrtf\' \
+ -verify_pattern=\'test_just_for_fio_dcrtf\' \
-numjobs=1 \
- -exitall_on_error=1 \
- -do_verify=0 \
- -iodepth=128 \
+ -iodepth=100 \
-group_reporting=1 \
- -ramp_time=0 \
+ -verify_dump=1 \
+ -verify_backlog=100 \
+ -exitall_on_error=1 \
+ -verify_fatal=1 \
-name=devsdb
-filename=/dev/sdb
- -write_iolog=1st_devsdb.log
+ -write_iolog=2nd_devsdb.log
You have changed both the bssplit, the iodepth and the size between your original jobfile and your verifying job file and these are parameters that mean the I/O of done by the verify will generate I/O sizes and orders different to that of the original write job. If you leave the same between jobs do you get a correct offset and address?
As you're doing sequential I/O with a fixed pattern you can probably get away with changing the iodepth (assuming the job ran to completion and was not interrupted). Changing the size means you may end up verifying data the write job never wrote so that's clearly problematic. Depending on the pattern changing the bssplit is going to cause problems because the pattern won't necessarily have the same start and end across different block sizes. As a a thought experiment imagine I have a non-repeating pattern that is 256Kbytes big. I will get a different result if I write that pattern in one 256Kbyte chunk to a block to if I write the first 16Kbyte of that pattern 16 times to the same block.
With regards to the patch I'm afraid at first glance it doesn't look correct given the above. Further I think it would do the wrong thing when the user sets a verify_interval of half that of a fixed block size and expects more granular verification.
One additional question is your fio patched in some form? In standard fio the verify_pattern option is not a filename - it is a string expression. This would mean your write job would be writing the hex pattern 0x2e0x2f0x740x650x730x740x5f0x6a0x750x730x740x5f0x660x6f0x720x5f0x660x690x6f0x5f0x640x630x720x740x66 and your verification step would be trying to verify the hex pattern 0x740x650x730x740x5f0x6a0x750x730x740x5f0x660x6f0x720x5f0x660x690x6f0x5f0x640x630x720x740x66 which is shorter and thus different and would lead to verifcation errors with stock fio...
If you can follow up with answers to the above questions that would allow us to progress this. Thanks!
Hi @AshCrismon,
I'm still interested to hear your feedback to my assessment - if you have the time please feel free to follow up!
Forlornly closing due to lack of reply from reporter. If this issue is still happening with the latest fio (see https://github.com/axboe/fio/releases to find out which version that is) please (please!) reopen. Thanks!