fio
fio copied to clipboard
Continuous increasing memory consumption for FIO when using a verify job
We are facing a issue where while running the job, the memory consumption of FIO keeps increasing and at some point we hit OOM on Linux CentOS 7.x. We were trying to write IO to a LUN of 20T with compression enabled. We have observed the same issue with VMs with any memory configuration.
# fio --version
fio-3.13-22-gd9c50
JOB:
[global]
ioengine=libaio
exitall_on_error=1
invalidate=1
direct=1
allow_file_create=0
refill_buffers=1
bs=8k
rw=randrw
rwmixread=50
rwmixwrite=50
group_reporting
verify=crc32c
do_verify=1
verify_fatal=1
verify_dump=1
iodepth_batch_submit=2
iodepth_low=16
iodepth=32
[mpatha-20T]
filename=/dev/mapper/mpatha
size=20T
buffer_pattern=0x44e5bbac
buffer_compress_percentage=70
buffer_compress_chunk=3k
PMAP:
# while true; do pmap -x 11139 | tail -1; sleep 5; done
total kB 1270484 596572 596168
total kB 1273652 599684 599280
total kB 1276952 602944 602540
total kB 1280912 606904 606500
total kB 1284080 610136 609732
total kB 1287248 613316 612912
total kB 1290680 616672 616268
total kB 1294244 620228 619824
total kB 1297544 623624 623220
total kB 1300976 627048 626644
total kB 1304276 630352 629948
total kB 1307708 633684 633280
total kB 1311140 637140 636736
total kB 1314704 640688 640284
total kB 1317872 643976 643572
total kB 1321436 647472 647068
total kB 1324868 650876 650472
total kB 1328168 654212 653808
total kB 1331600 657576 657172
total kB 1334900 660940 660536
total kB 1338596 664576 664172
total kB 1342028 668084 667680
total kB 1345460 671532 671128
total kB 1348760 674788 674384
total kB 1352588 678596 678192
total kB 1356152 682152 681748
TOP:
top - 05:38:30 up 22 min, 4 users, load average: 0.96, 0.83, 0.63
Tasks: 282 total, 3 running, 279 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.0 us, 18.4 sy, 0.0 ni, 57.7 id, 0.0 wa, 0.0 hi, 13.8 si, 0.0 st
KiB Mem : 1863224 total, 66852 free, 1230260 used, 566112 buff/cache
KiB Swap: 1679356 total, 1345020 free, 334336 used. 31024 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11139 root 20 0 1466764 792628 468 R 19.9 42.5 2:12.17 fio
11330 root 20 0 0 0 0 S 6.6 0.0 0:09.77 kworker/u256:2
@chamarthy Great report! Can you try the experimental_verify
option?
will verify and let you know.
@chamarthy any news?
The first run, without experimental_verify
grew to around 16GB memory usage on a 3TB file (I canceled it with 9 minutes estimated left).
The run with experimental_verify
is still running (1h 24m remaining), but so far I'm not seeing any growth in memory usage, with fio using just around 51MB.
The run with experimental_verify
finished, and I didn't see the memory usage go above 51MB.
OK the results reported by @bcran were kind of what we would expect. The non-experimental verify actually extends a data structure with I/Os to verify whereas experimental just regenerates what is required at verification time. Off the top of my head I can only see experimental going wrong when you have the same I/O colliding with another I/O for the same region while both are in-flight and I'd guess that could only happens if you aren't using a randommap or at wraparound time.
I just encounter same issue, this actually can be avoid by using "verify_backlog" option.