fio icon indicating copy to clipboard operation
fio copied to clipboard

FIO report pattern: verify failed for --verify_pattern=0xff%o\"abcd\"-12

Open aepotapov opened this issue 1 year ago • 2 comments

fio --verison
fio-3.38
cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
uname -a
Linux NB-1828 5.15.145-0515145-generic #202312231032 SMP Sat Dec 23 10:39:58 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I get verify errors with following two runs:

1. fio --name=pat01-04 --ioengine=libaio --thread --direct=1 --iodepth=4 --rw=write --verify=pattern
--verify_pattern=0xff%o\"abcd\"-12 --do_verify=0 --verify_state_save=0 --end_fsync=1 --group_reporting=1
--directory=./dir01:./dir02 --size=1Gi --blocksize=1Mi --numjobs=2                                                                                          
pat01-04: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4                                                                                            
...                                                                                                                                                                                                                
fio-3.38                                                                                                                                                                                                           
Starting 2 threads                                                                                                                                                                                                 
pat01-04: Laying out IO file (1 file / 1024MiB)                                                                                                                                                                    
pat01-04: Laying out IO file (1 file / 1024MiB)                                                                                                                                                                    
Jobs: 2 (f=2)                                                                                                                                                                                                      
pat01-04: (groupid=0, jobs=2): err= 0: pid=40040: Tue Dec  3 17:28:34 2024                                                                                                                                         
  write: IOPS=1935, BW=1936MiB/s (2030MB/s)(2048MiB/1058msec); 0 zone resets                                                                                                                                       
    slat (usec): min=55, max=729, avg=180.58, stdev=91.44                                                                                                                                                          
    clat (usec): min=2181, max=13064, avg=3856.78, stdev=1566.86                                                                                                                                                   
     lat (usec): min=2677, max=13364, avg=4037.37, stdev=1566.46                                                                                                                                                   
    clat percentiles (usec):                                                                                                                                                                                       
     |  1.00th=[ 2900],  5.00th=[ 3195], 10.00th=[ 3294], 20.00th=[ 3359],                                                                                                                                         
     | 30.00th=[ 3425], 40.00th=[ 3458], 50.00th=[ 3490], 60.00th=[ 3523],                                                                                                                                         
     | 70.00th=[ 3556], 80.00th=[ 3589], 90.00th=[ 4228], 95.00th=[ 5014],                                                                                                                                         
     | 99.00th=[12125], 99.50th=[12387], 99.90th=[12780], 99.95th=[12780],                                                                                                                                         
     | 99.99th=[13042]                                                                                                                                                                                             
   bw (  MiB/s): min= 1920, max= 1976, per=100.00%, avg=1948.00, stdev=16.21, samples=4                                                                                                                            
   iops        : min= 1920, max= 1976, avg=1948.00, stdev=16.21, samples=4                                                                                                                                         
  lat (msec)   : 4=85.99%, 10=10.84%, 20=3.17%                                                                                                                                                                     
  cpu          : usr=16.94%, sys=8.94%, ctx=2040, majf=0, minf=2                                                                                                                                                   
  IO depths    : 1=0.1%, 2=0.2%, 4=99.7%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%                                                                                                                                      
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2048,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: bw=1936MiB/s (2030MB/s), 1936MiB/s-1936MiB/s (2030MB/s-2030MB/s), io=2048MiB (2147MB), run=1058-1058msec

Disk stats (read/write):
  nvme1n1: ios=0/4351, sectors=0/3560464, merge=0/0, ticks=0/16269, in_queue=16272, util=90.05%
3. fio --name=pat01-04 --ioengine=libaio --thread --direct=1 --iodepth=4 --rw=read --verify=pattern
--verify_pattern=0xff%o\"abcd\"-12 --do_verify=1 --verify_only --verify_state_load=0 --verify_dump=1
--group_reporting=1 --directory=./dir01:./dir02 --size=1Gi --blocksize=1Mi --numjobs=2
pat01-04: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4
...
fio-3.38
Starting 2 threads
fio: got pattern '30', wanted 'f0'. Bad bits 2
fio: bad pattern block offset 3
pattern: verify failed at file ./dir02/pat01-04.1.0 offset 422576128, length 1048576 (requested block: offset=422576128, length=1048576, flags=8)
       received data dumped as pat01-04.1.0.422576128.received
       expected data dumped as pat01-04.1.0.422576128.expected
fio: verify type mismatch (12288 media, 18 given)
fio: got pattern '50', wanted 'a0'. Bad bits 4
fio: bad pattern block offset 3
pattern: verify failed at file ./dir02/pat01-04.1.0 offset 424673280, length 1048576 (requested block: offset=424673280, length=1048576, flags=8)
       received data dumped as pat01-04.1.0.424673280.received
       expected data dumped as pat01-04.1.0.424673280.expected
fio: verify type mismatch (20480 media, 18 given)
fio: pid=40092, err=84/file:io_u.c:2290, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character

pat01-04: (groupid=0, jobs=2): err=84 (file:io_u.c:2290, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=40091: Tue Dec  3 17:28:50 2024
  read: IOPS=2260, BW=2261MiB/s (2370MB/s)(1431MiB/633msec)
    slat (usec): min=34, max=439, avg=92.45, stdev=41.31
    clat (usec): min=766, max=14295, avg=2591.98, stdev=1209.22
     lat (usec): min=891, max=14591, avg=2684.42, stdev=1206.50
    clat percentiles (usec):
     |  1.00th=[ 1090],  5.00th=[ 1287], 10.00th=[ 1401], 20.00th=[ 1500],
     | 30.00th=[ 1598], 40.00th=[ 1876], 50.00th=[ 2966], 60.00th=[ 3130],
     | 70.00th=[ 3261], 80.00th=[ 3425], 90.00th=[ 3621], 95.00th=[ 3785],
     | 99.00th=[ 4228], 99.50th=[10814], 99.90th=[14091], 99.95th=[14353],
     | 99.99th=[14353]
   bw (  MiB/s): min= 1458, max= 1458, per=64.49%, avg=1458.00, stdev= 0.00, samples=1
   iops        : min= 1458, max= 1458, avg=1458.00, stdev= 0.00, samples=1
  lat (usec)   : 1000=0.21%
  lat (msec)   : 2=41.79%, 4=55.90%, 10=1.54%, 20=0.56%
  cpu          : usr=11.98%, sys=17.76%, ctx=1366, majf=0, minf=2563
  IO depths    : 1=0.1%, 2=0.3%, 4=99.6%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1431,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=2261MiB/s (2370MB/s), 2261MiB/s-2261MiB/s (2370MB/s-2370MB/s), io=1431MiB (1501MB), run=633-633msec

Disk stats (read/write):
  nvme1n1: ios=2670/0, sectors=1822720/0, merge=0/0, ticks=8022/0, in_queue=8023, util=80.81%

At the same time comparing dump files says they are the same.

aepotapov avatar Dec 03 '24 14:12 aepotapov

I was able to trigger this issue once but subsequent attempts did not produce verification errors. Do you see this issue every time?

Can you simplify your job file to include only options that are required to produce this issue? https://stackoverflow.com/help/mcve

vincentkfu avatar Dec 05 '24 19:12 vincentkfu

Thank you, I'll try

aepotapov avatar Dec 06 '24 11:12 aepotapov

@aepotapov were you able to simplify the job file down?

sitsofe avatar Jul 21 '25 20:07 sitsofe

Closing because we need more information from the reporter to take this further.

@aepotapov - please re-open this issue if you're able to supply the requested information. Thank you!

sitsofe avatar Aug 08 '25 19:08 sitsofe

It looks like the "--thread" parameter makes difference. I was able to reproduce with following list of parameters:

fio --thread --name=pat01-04 --ioengine=libaio --rw=write --verify=pattern --verify_pattern=0xff%o\"abcd\"-12 --do_verify=0 --verify_state_save=0 --directory=./dir01:./dir02 --size=10Mi --numjobs=2
pat01-04: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.38
Starting 2 threads

pat01-04: (groupid=0, jobs=1): err= 0: pid=16606: Fri Aug 22 13:08:34 2025
  write: IOPS=320k, BW=1250MiB/s (1311MB/s)(10.0MiB/8msec); 0 zone resets
    slat (nsec): min=1573, max=82105, avg=2106.01, stdev=2096.12
    clat (nsec): min=310, max=14627, avg=340.81, stdev=323.05
     lat (nsec): min=1913, max=82556, avg=2446.82, stdev=2124.69
    clat percentiles (nsec):
     |  1.00th=[  322],  5.00th=[  322], 10.00th=[  322], 20.00th=[  322],
     | 30.00th=[  330], 40.00th=[  330], 50.00th=[  330], 60.00th=[  330],
     | 70.00th=[  330], 80.00th=[  342], 90.00th=[  342], 95.00th=[  342],
     | 99.00th=[  350], 99.50th=[  422], 99.90th=[ 1048], 99.95th=[ 8256],
     | 99.99th=[14656]
  lat (nsec)   : 500=99.88%
  lat (usec)   : 2=0.04%, 10=0.04%, 20=0.04%
  cpu          : usr=0.00%, sys=100.00%, ctx=1, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
pat01-04: (groupid=0, jobs=1): err= 0: pid=16607: Fri Aug 22 13:08:34 2025
  write: IOPS=151k, BW=588MiB/s (617MB/s)(10.0MiB/17msec); 0 zone resets
    slat (usec): min=3, max=259, avg= 4.29, stdev= 5.45
    clat (nsec): min=711, max=2715, avg=810.59, stdev=122.50
     lat (usec): min=4, max=259, avg= 5.10, stdev= 5.47
    clat percentiles (nsec):
     |  1.00th=[  724],  5.00th=[  732], 10.00th=[  732], 20.00th=[  740],
     | 30.00th=[  740], 40.00th=[  740], 50.00th=[  748], 60.00th=[  756],
     | 70.00th=[  764], 80.00th=[  956], 90.00th=[  972], 95.00th=[ 1080],
     | 99.00th=[ 1128], 99.50th=[ 1160], 99.90th=[ 1432], 99.95th=[ 1576],
     | 99.99th=[ 2704]
  lat (nsec)   : 750=45.39%, 1000=49.06%
  lat (usec)   : 2=5.51%, 4=0.04%
  cpu          : usr=18.75%, sys=68.75%, ctx=1, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1176MiB/s (1234MB/s), 588MiB/s-1250MiB/s (617MB/s-1311MB/s), io=20.0MiB (21.0MB), run=8-17msec

Disk stats (read/write):
  nvme0n1: ios=0/0, sectors=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

verify job:

fio --thread --name=pat01-04 --ioengine=libaio --rw=read --verify=pattern --verify_pattern=0xff%o\"abcd\"-12 --do_verify=1 --verify_only --verify_state_load=0 --directory=./dir01:./dir02 --size=10Mi --numjobs=2
pat01-04: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.38
Starting 2 threads
fio: got pattern '40', wanted '50'. Bad bits 1
fio: bad pattern block offset 2
pattern: verify failed at file ./dir02/pat01-04.1.0 offset 671744, length 4096 (requested block: offset=671744, length=4096, flags=8)
fio: verify type mismatch (2624 media, 18 given)
fio: pid=16625, err=84/file:io_u.c:2290, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character

pat01-04: (groupid=0, jobs=1): err= 0: pid=16624: Fri Aug 22 13:08:39 2025
  read: IOPS=69.2k, BW=270MiB/s (283MB/s)(10.0MiB/37msec)
    slat (nsec): min=781, max=301290, avg=4018.36, stdev=16295.73
    clat (nsec): min=300, max=13676, avg=896.73, stdev=547.27
     lat (nsec): min=1102, max=305337, avg=4915.09, stdev=16401.53
    clat percentiles (nsec):
     |  1.00th=[  310],  5.00th=[  310], 10.00th=[  868], 20.00th=[  884],
     | 30.00th=[  884], 40.00th=[  892], 50.00th=[  892], 60.00th=[  892],
     | 70.00th=[  900], 80.00th=[  900], 90.00th=[  924], 95.00th=[ 1176],
     | 99.00th=[ 1336], 99.50th=[ 1864], 99.90th=[10944], 99.95th=[12096],
     | 99.99th=[13632]
  lat (nsec)   : 500=7.85%, 750=0.55%, 1000=84.96%
  lat (usec)   : 2=6.29%, 10=0.16%, 20=0.20%
  cpu          : usr=16.67%, sys=33.33%, ctx=19, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2560,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
pat01-04: (groupid=0, jobs=1): err=84 (file:io_u.c:2290, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=16625: Fri Aug 22 13:08:39 2025
  read: IOPS=7173, BW=28.0MiB/s (29.4MB/s)(660KiB/23msec)
    slat (nsec): min=1884, max=217771, avg=7208.07, stdev=27615.47
    clat (nsec): min=871, max=4599, avg=952.81, stdev=387.74
     lat (usec): min=2, max=222, avg= 8.16, stdev=27.94
    clat percentiles (nsec):
     |  1.00th=[  876],  5.00th=[  884], 10.00th=[  884], 20.00th=[  892],
     | 30.00th=[  892], 40.00th=[  892], 50.00th=[  900], 60.00th=[  900],
     | 70.00th=[  900], 80.00th=[  908], 90.00th=[  916], 95.00th=[  972],
     | 99.00th=[ 4080], 99.50th=[ 4576], 99.90th=[ 4576], 99.95th=[ 4576],
     | 99.99th=[ 4576]
  lat (nsec)   : 1000=95.76%
  lat (usec)   : 2=3.03%, 10=1.21%
  cpu          : usr=0.00%, sys=22.73%, ctx=13, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.6%, 4=99.4%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=165,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=288MiB/s (302MB/s), 28.0MiB/s-270MiB/s (29.4MB/s-283MB/s), io=10.6MiB (11.2MB), run=23-37msec

Disk stats (read/write):
  nvme0n1: ios=0/0, sectors=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

aepotapov avatar Aug 22 '25 10:08 aepotapov

Hello @aepotapov: As requested, can you reduce the job file further? For example can you do it with the default ioengine by removing --ioengine=libaio? Can you remove --directory=... and still make it happen? Does it happen with numjobs=1? And so on.

sitsofe avatar Aug 22 '25 10:08 sitsofe

OK I've reproduced this locally with the following:

fio --thread --name=pat01-04 --size=128k --numjobs=2 --verify=pattern --verify_pattern=%o --do_verify=0 --rw=write 
fio --thread --name=pat01-04 --size=128k --numjobs=2 --verify=pattern --verify_pattern=%o --verify_only=1 --rw=read

--thread doesn't matter on the first (write) job but using --thread on the second (verifying) job leads to non deterministic failures if you re-run it over and over:

# fio --thread --name=pat01-04 --rw=read --verify=pattern --verify_pattern=%o --verify_only=1 --size=128k --numjobs=2 > /dev/null
fio: got pattern 'd0', wanted '20'. Bad bits 4
fio: bad pattern block offset 1
pattern: verify failed at file pat01-04.0.0 offset 53248, length 0 (requested block: offset=53248, length=4096, flags=8)
fio: verify type mismatch (0 media, 18 given)
# fio --thread --name=pat01-04 --rw=read --verify=pattern --verify_pattern=%o --verify_only=1 --size=128k --numjobs=2 > /dev/null
# fio --thread --name=pat01-04 --rw=read --verify=pattern --verify_pattern=%o --verify_only=1 --size=128k --numjobs=2 > /dev/null
fio: got pattern '30', wanted '20'. Bad bits 1
fio: bad pattern block offset 1
pattern: verify failed at file pat01-04.0.0 offset 12288, length 0 (requested block: offset=12288, length=4096, flags=8)
fio: verify type mismatch (0 media, 18 given)

sitsofe avatar Aug 22 '25 10:08 sitsofe

Ok. The parameter --ioengine=libaio and --directory is not necessary, but minimum 2 jobs and --thread are required.

fio --thread --name=pat01-04 --rw=write --verify=pattern --verify_pattern=0xff%o\"abcd\"-12 --do_verify=0 --verify_state_save=0 --size=10Mi --numjobs=2
pat01-04: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.38
Starting 2 threads

pat01-04: (groupid=0, jobs=1): err= 0: pid=21022: Fri Aug 22 14:01:54 2025
  write: IOPS=366k, BW=1429MiB/s (1498MB/s)(10.0MiB/7msec); 0 zone resets
    clat (nsec): min=1272, max=143091, avg=1920.11, stdev=3699.60
     lat (nsec): min=1393, max=143211, avg=2046.01, stdev=3703.90
    clat percentiles (nsec):
     |  1.00th=[  1304],  5.00th=[  1336], 10.00th=[  1384], 20.00th=[  1416],
     | 30.00th=[  1480], 40.00th=[  1544], 50.00th=[  1624], 60.00th=[  1672],
     | 70.00th=[  1752], 80.00th=[  1896], 90.00th=[  2192], 95.00th=[  2480],
     | 99.00th=[  8768], 99.50th=[  9536], 99.90th=[ 33536], 99.95th=[104960],
     | 99.99th=[142336]
  lat (usec)   : 2=84.34%, 4=13.87%, 10=1.37%, 20=0.27%, 50=0.08%
  lat (usec)   : 250=0.08%
  cpu          : usr=0.00%, sys=66.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
pat01-04: (groupid=0, jobs=1): err= 0: pid=21023: Fri Aug 22 14:01:54 2025
  write: IOPS=320k, BW=1250MiB/s (1311MB/s)(10.0MiB/8msec); 0 zone resets
    clat (nsec): min=1302, max=19968, avg=1805.14, stdev=1059.33
     lat (nsec): min=1423, max=20088, avg=1926.36, stdev=1060.95
    clat percentiles (nsec):
     |  1.00th=[ 1320],  5.00th=[ 1352], 10.00th=[ 1384], 20.00th=[ 1416],
     | 30.00th=[ 1480], 40.00th=[ 1560], 50.00th=[ 1640], 60.00th=[ 1688],
     | 70.00th=[ 1768], 80.00th=[ 1880], 90.00th=[ 2160], 95.00th=[ 2480],
     | 99.00th=[ 8384], 99.50th=[ 9280], 99.90th=[15936], 99.95th=[19328],
     | 99.99th=[20096]
  lat (usec)   : 2=85.12%, 4=13.48%, 10=1.09%, 20=0.31%
  cpu          : usr=0.00%, sys=57.14%, ctx=0, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2560,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=2500MiB/s (2621MB/s), 1250MiB/s-1429MiB/s (1311MB/s-1498MB/s), io=20.0MiB (21.0MB), run=7-8msec

Disk stats (read/write):
  nvme0n1: ios=0/0, sectors=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

verify job:

fio --thread --name=pat01-04 --rw=read --verify=pattern --verify_pattern=0xff%o\"abcd\"-12 --do_verify=1 --verify_only --verify_state_load=0 --size=10Mi --numjobs=2
pat01-04: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.38
Starting 2 threads
fio: got pattern '00', wanted '70'. Bad bits 3
fio: bad pattern block offset 2
pattern: verify failed at file pat01-04.1.0 offset 131072, length 4096 (requested block: offset=131072, length=4096, flags=8)
fio: verify type mismatch (512 media, 18 given)
fio: pid=21042, err=84/file:io_u.c:2252, func=io_u_sync_complete, error=Invalid or incomplete multibyte or wide character

pat01-04: (groupid=0, jobs=1): err= 0: pid=21041: Fri Aug 22 14:02:05 2025
  read: IOPS=320k, BW=1250MiB/s (1311MB/s)(10.0MiB/8msec)
    clat (nsec): min=571, max=610144, avg=2383.56, stdev=18439.42
     lat (nsec): min=601, max=610605, avg=2415.12, stdev=18445.35
    clat percentiles (nsec):
     |  1.00th=[   580],  5.00th=[   588], 10.00th=[   588], 20.00th=[   604],
     | 30.00th=[   628], 40.00th=[   652], 50.00th=[   660], 60.00th=[   668],
     | 70.00th=[   676], 80.00th=[   692], 90.00th=[   732], 95.00th=[   924],
     | 99.00th=[ 61696], 99.50th=[ 79360], 99.90th=[261120], 99.95th=[403456],
     | 99.99th=[610304]
  lat (nsec)   : 750=90.90%, 1000=4.84%
  lat (usec)   : 2=2.42%, 10=0.12%, 50=0.04%, 100=1.29%, 250=0.27%
  lat (usec)   : 500=0.08%, 750=0.04%
  cpu          : usr=71.43%, sys=0.00%, ctx=38, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2560,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
pat01-04: (groupid=0, jobs=1): err=84 (file:io_u.c:2252, func=io_u_sync_complete, error=Invalid or incomplete multibyte or wide character): pid=21042: Fri Aug 22 14:02:05 2025
  read: IOPS=11.0k, BW=43.0MiB/s (45.1MB/s)(132KiB/3msec)
    clat (nsec): min=581, max=338570, avg=32140.67, stdev=89078.47
     lat (nsec): min=611, max=338600, avg=32186.91, stdev=89089.98
    clat percentiles (nsec):
     |  1.00th=[   580],  5.00th=[   612], 10.00th=[   652], 20.00th=[   660],
     | 30.00th=[   668], 40.00th=[   668], 50.00th=[   676], 60.00th=[   684],
     | 70.00th=[   740], 80.00th=[  1208], 90.00th=[130560], 95.00th=[292864],
     | 99.00th=[337920], 99.50th=[337920], 99.90th=[337920], 99.95th=[337920],
     | 99.99th=[337920]
  lat (nsec)   : 750=72.73%, 1000=3.03%
  lat (usec)   : 2=9.09%, 20=3.03%, 250=3.03%, 500=9.09%
  cpu          : usr=50.00%, sys=0.00%, ctx=4, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=2.9%, 4=97.1%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=33,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=1266MiB/s (1328MB/s), 43.0MiB/s-1250MiB/s (45.1MB/s-1311MB/s), io=10.1MiB (10.6MB), run=3-8msec

Disk stats (read/write):
  nvme0n1: ios=0/0, sectors=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

aepotapov avatar Aug 22 '25 11:08 aepotapov

Without looking further it could be that some locking is missing based on what valgrind says:

==4856== Possible data race during write of size 8 at 0xA185600 by thread #16
==4856== Locks held: none
==4856==    at 0x4897870: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_helgrind-arm64-linux.so)
==4856==    by 0x14A11F: paste_blockoff (in /usr/bin/fio)
==4856==    by 0x12E64B: paste_format_inplace (in /usr/bin/fio)
==4856==    by 0x150FB7: ??? (in /usr/bin/fio)
==4856==    by 0x151A5B: verify_io_u (in /usr/bin/fio)
==4856==    by 0x15FEF3: ??? (in /usr/bin/fio)
==4856==    by 0x1601E3: io_u_sync_complete (in /usr/bin/fio)
==4856==    by 0x175573: io_queue_event (in /usr/bin/fio)
==4856==    by 0x17D8A7: ??? (in /usr/bin/fio)
==4856==    by 0x4892B93: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-arm64-linux.so)
==4856==    by 0x56F595B: start_thread (pthread_create.c:447)
==4856==    by 0x575B7DB: thread_start (clone.S:79)
==4856== 
==4856== This conflicts with a previous write of size 8 by thread #15
==4856== Locks held: none
==4856==    at 0x4897870: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_helgrind-arm64-linux.so)
==4856==    by 0x14A11F: paste_blockoff (in /usr/bin/fio)
==4856==    by 0x12E64B: paste_format_inplace (in /usr/bin/fio)
==4856==    by 0x150FB7: ??? (in /usr/bin/fio)
==4856==    by 0x151A5B: verify_io_u (in /usr/bin/fio)
==4856==    by 0x15FEF3: ??? (in /usr/bin/fio)
==4856==    by 0x1601E3: io_u_sync_complete (in /usr/bin/fio)
==4856==    by 0x175573: io_queue_event (in /usr/bin/fio)
==4856==  Address 0xa185600 is 0 bytes inside a block of size 8 alloc'd
==4856==    at 0x48873E0: malloc (in /usr/libexec/valgrind/vgpreload_helgrind-arm64-linux.so)
==4856==    by 0x1305BB: parse_and_fill_pattern_alloc (in /usr/bin/fio)
==4856==    by 0x15F3CF: ??? (in /usr/bin/fio)
==4856==    by 0x15A687: ??? (in /usr/bin/fio)
==4856==    by 0x15B3AB: ??? (in /usr/bin/fio)
==4856==    by 0x15B45B: parse_cmd_option (in /usr/bin/fio)
==4856==    by 0x15B51B: fio_cmd_option_parse (in /usr/bin/fio)
==4856==    by 0x13CF2F: parse_cmd_line (in /usr/bin/fio)
==4856==    by 0x13D35B: parse_options (in /usr/bin/fio)
==4856==    by 0x126D9F: main (in /usr/bin/fio)
==4856==  Block was alloc'd by thread #1

sitsofe avatar Aug 22 '25 11:08 sitsofe

I believe that the problem is that the buffer used to store the expansion of the %o pattern is shared between threads. So there can be verify failures if one thread tries to compare what is on the device using the other thread's expansion of %o.

This seems to resolve the issue for me with a few runs of @sitsofe 's test case.

diff --git a/verify.c b/verify.c
index c7f43c06..20c49a94 100644
--- a/verify.c
+++ b/verify.c
@@ -431,7 +431,7 @@ done:
  */
 static inline bool pattern_need_buffer(struct thread_data *td)
 {
-       return td->o.verify_async &&
+       return (td->o.verify_async || td->o.use_thread) &&
                td->o.verify_fmt_sz &&
                td->o.verify_fmt[0].desc->paste == paste_blockoff;
 }

We've also seen some verify failures in our Windows automated testing which may be due to the same issue since Windows requires threads.

I fixed this issue for verify_async but did not consider the thread=1 case back then:

https://github.com/axboe/fio/commit/7431e1549e49166663ce6625eb9a31ce1833b491

vincentkfu avatar Aug 22 '25 12:08 vincentkfu

Instead of doing a malloc/free for every verify we should probably fix this to make a new buffer at init time.

vincentkfu avatar Aug 22 '25 12:08 vincentkfu

@vincentkfu so are you thinking reserving a fixed thread count + verify_async count array of buffers and somehow you index into one of the slots?

sitsofe avatar Aug 22 '25 13:08 sitsofe

Yes, I think that's the right approach.

vincentkfu avatar Aug 22 '25 16:08 vincentkfu