Fio "stuck" when testing on 4TB usb4 on m1 pro
As topic, it been 1 week and fio not finish executing same scripts.
fio version - 3.39 fio setting
ioengine=posixaio iodepth=32 size=3600g direct=1 runtime=7200s ramp_time=5s thread=1 nrfiles=1 log_avg_msec=1000 bs=1024k rw=read
cmd line
fio /Users/test/scripts/seq-read_1024k_qd32_4000g.fiocfg --directory=/Volumes/DiskTest --output-format=json --output=output/seq-read_1024k_qd32_4000g.json --eta=always --write_bw_log=output/bw_log --debug=all
Hello @killer02354,
I'm afraid there is not enough in this bug report to be able to diagnose the issue. Some things that would be useful to know:
- What version of macOS are using?
- Where did you get your version of fio from?
- Do you know if the job even starts doing reads or is the problem happening at initial layout?
- How long does it take before the issue occurs? 1 minute? 1 hour? More? You may be able to calculate this by looking in your bandwidth log
- Were you able to still use other programs to read from files in
/Volumes/DiskTestafter the issue occured? - What filesystem is
/Volumes/DiskTestusing? - Does the problem when you use a smaller
size? For example start at 1G, then test 100G, then test 1T - Can you minimise the job file and command line options such that you have the smallest amount that still reproduce the issue (don't stop at the first option that is required, put it back and then try to remove the next option). For example, does the problem happen without all of
--output-format=json --output=output/seq-read_1024k_qd32_4000g.json --eta=always --write_bw_log=output/bw_log --debug=all? Does it happen withoutramp_timeetc? Does it happen withoutlog_avg_msec? Please remove as many options as possible. - Does the problem happen with a smaller
bs? What happens with 4k? 64k? 128k? - Does the problem happen with the
psyncioengine? - Does the problem happen doing I/O to your main internal macOS volume?
Looking through your log shows this:
[...]
helperthread 4611 clk_tck = 100
mutex 259 done waiting on startup_sem
file 259 setup files
process 259 pid=0: runstate NOT_CREATED -> SETTING_UP
file 259 get file size for 0x1052ba130/0//Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0
file 259 layout unlink /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0
file 259 open file /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0, flags 601
file 259 native fallocate of file /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0 size 3865470566400 was unsuccessful
file 259 truncate file /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0, size 3865470566400
helperthread 4611 next_log: 500, msec_to_next_event: 244
helperthread 4611 next_log: 500, msec_to_next_event: 245
[...]
It's a bit strange that preallocation failed. It then looks like fio fell back to truncation but then we never see any other actions outside that of the helperthread (I would expect to see io, process, mutex and file operations).
Hi @sitsofe, kindly refer reply as below.
-
What version of macOS are using macOs Monterey 12.7.4
-
Where did you get your version of fio from? Homebrew
-
Do you know if the job even starts doing reads or is the problem happening at initial layout? i think is start from read because i can see there is a file created in the test drive.
-
How long does it take before the issue occurs? 1 minute? 1 hour? More? You may be able to calculate this by looking in your bandwidth log I tried using same cmd line & script on m2, m3 pro and it execute successfully
-
Were you able to still use other programs to read from files in /Volumes/DiskTest after the issue occured? I can try later.
-
What filesystem is /Volumes/DiskTest using? ExFat
-
Does the problem when you use a smaller size? For example start at 1G, then test 100G, then test 1T No, only happen when testing 4T.
-
Can you minimise the job file and command line options such that you have the smallest amount that still reproduce the issue (don't stop at the first option that is required, put it back and then try to remove the next option). For example, does the problem happen without all of --output-format=json --output=output/seq-read_1024k_qd32_4000g.json --eta=always --write_bw_log=output/bw_log --debug=all? Does it happen without ramp_time etc? Does it happen without log_avg_msec? Please remove as many options as possible. It run successfully when i remove --output-format=json --output=output/seq-read_1024k_qd32_4000g.json
-
Does the problem happen with a smaller bs? What happens with 4k? 64k? 128k?
-
Does the problem happen with the prwrite ioengine?
-
Does the problem happen doing I/O to your main internal macOS volume? I will provide test result later.
It run successfully when i remove --output-format=json --output=output/seq-read_1024k_qd32_4000g.json
Please remove as many parameters as possible: don't stop at the first option that is required, put it back and then try to remove the next option and so on.
However given that this problem doesn't happen on an M2 or M3 pro my best guess is that you are hitting a bug in macOS or a quirk of your hardware and that this is not a bug in fio...
It run successfully when i remove --output-format=json --output=output/seq-read_1024k_qd32_4000g.json
Please remove as many parameters as possible: don't stop at the first option that is required, put it back and then try to remove the next option and so on.
However given that this problem doesn't happen on an M2 or M3 pro my best guess is that you are hitting a bug in macOS or a quirk of your hardware and that this is not a bug in fio...
Hi sitsofe, thanks info. i will seek assits from mac.
Besides, may i know what helperthread mean inside log files?
@killer02354 : I think the helper_thread does various periodic tasks (such as collecting disk stats, displaying stats, or checking if the steady state has been reached) to avoid blocking the main thread on those tasks.
Closing due to lack of reply from reporter. If this issue is still happening with the latest fio (see https://github.com/axboe/fio/releases to find out which version that is) please reopen. Thanks!