fio icon indicating copy to clipboard operation
fio copied to clipboard

Fio "stuck" when testing on 4TB usb4 on m1 pro

Open killer02354 opened this issue 7 months ago • 5 comments

As topic, it been 1 week and fio not finish executing same scripts.

fio version - 3.39 fio setting

ioengine=posixaio iodepth=32 size=3600g direct=1 runtime=7200s ramp_time=5s thread=1 nrfiles=1 log_avg_msec=1000 bs=1024k rw=read

cmd line

fio /Users/test/scripts/seq-read_1024k_qd32_4000g.fiocfg --directory=/Volumes/DiskTest --output-format=json --output=output/seq-read_1024k_qd32_4000g.json --eta=always --write_bw_log=output/bw_log --debug=all

seq-read_1024k_qd32_4000g.20250520.150708.json

killer02354 avatar May 28 '25 06:05 killer02354

Hello @killer02354,

I'm afraid there is not enough in this bug report to be able to diagnose the issue. Some things that would be useful to know:

  • What version of macOS are using?
  • Where did you get your version of fio from?
  • Do you know if the job even starts doing reads or is the problem happening at initial layout?
  • How long does it take before the issue occurs? 1 minute? 1 hour? More? You may be able to calculate this by looking in your bandwidth log
  • Were you able to still use other programs to read from files in /Volumes/DiskTest after the issue occured?
  • What filesystem is /Volumes/DiskTest using?
  • Does the problem when you use a smaller size? For example start at 1G, then test 100G, then test 1T
  • Can you minimise the job file and command line options such that you have the smallest amount that still reproduce the issue (don't stop at the first option that is required, put it back and then try to remove the next option). For example, does the problem happen without all of --output-format=json --output=output/seq-read_1024k_qd32_4000g.json --eta=always --write_bw_log=output/bw_log --debug=all? Does it happen without ramp_time etc? Does it happen without log_avg_msec? Please remove as many options as possible.
  • Does the problem happen with a smaller bs? What happens with 4k? 64k? 128k?
  • Does the problem happen with the psync ioengine?
  • Does the problem happen doing I/O to your main internal macOS volume?

Looking through your log shows this:

[...]
helperthread 4611  clk_tck = 100 
mutex    259   done waiting on startup_sem
file     259   setup files 
process  259   pid=0: runstate NOT_CREATED -> SETTING_UP
file     259   get file size for 0x1052ba130/0//Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0
file     259   layout unlink /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0
file     259   open file /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0, flags 601
file     259   native fallocate of file /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0 size 3865470566400 was unsuccessful
file     259   truncate file /Volumes/DiskTest/seq-read_1024k_qd32_4000g.0.0, size 3865470566400
helperthread 4611  next_log: 500, msec_to_next_event: 244
helperthread 4611  next_log: 500, msec_to_next_event: 245
[...]

It's a bit strange that preallocation failed. It then looks like fio fell back to truncation but then we never see any other actions outside that of the helperthread (I would expect to see io, process, mutex and file operations).

sitsofe avatar May 28 '25 06:05 sitsofe

Hi @sitsofe, kindly refer reply as below.

  • What version of macOS are using macOs Monterey 12.7.4

  • Where did you get your version of fio from? Homebrew

  • Do you know if the job even starts doing reads or is the problem happening at initial layout? i think is start from read because i can see there is a file created in the test drive.

  • How long does it take before the issue occurs? 1 minute? 1 hour? More? You may be able to calculate this by looking in your bandwidth log I tried using same cmd line & script on m2, m3 pro and it execute successfully

  • Were you able to still use other programs to read from files in /Volumes/DiskTest after the issue occured? I can try later.

  • What filesystem is /Volumes/DiskTest using? ExFat

  • Does the problem when you use a smaller size? For example start at 1G, then test 100G, then test 1T No, only happen when testing 4T.

  • Can you minimise the job file and command line options such that you have the smallest amount that still reproduce the issue (don't stop at the first option that is required, put it back and then try to remove the next option). For example, does the problem happen without all of --output-format=json --output=output/seq-read_1024k_qd32_4000g.json --eta=always --write_bw_log=output/bw_log --debug=all? Does it happen without ramp_time etc? Does it happen without log_avg_msec? Please remove as many options as possible. It run successfully when i remove --output-format=json --output=output/seq-read_1024k_qd32_4000g.json

  • Does the problem happen with a smaller bs? What happens with 4k? 64k? 128k?

  • Does the problem happen with the prwrite ioengine?

  • Does the problem happen doing I/O to your main internal macOS volume? I will provide test result later.

killer02354 avatar May 29 '25 08:05 killer02354

It run successfully when i remove --output-format=json --output=output/seq-read_1024k_qd32_4000g.json

Please remove as many parameters as possible: don't stop at the first option that is required, put it back and then try to remove the next option and so on.

However given that this problem doesn't happen on an M2 or M3 pro my best guess is that you are hitting a bug in macOS or a quirk of your hardware and that this is not a bug in fio...

sitsofe avatar May 29 '25 08:05 sitsofe

It run successfully when i remove --output-format=json --output=output/seq-read_1024k_qd32_4000g.json

Please remove as many parameters as possible: don't stop at the first option that is required, put it back and then try to remove the next option and so on.

However given that this problem doesn't happen on an M2 or M3 pro my best guess is that you are hitting a bug in macOS or a quirk of your hardware and that this is not a bug in fio...

Hi sitsofe, thanks info. i will seek assits from mac.

Besides, may i know what helperthread mean inside log files?

killer02354 avatar Jun 04 '25 02:06 killer02354

@killer02354 : I think the helper_thread does various periodic tasks (such as collecting disk stats, displaying stats, or checking if the steady state has been reached) to avoid blocking the main thread on those tasks.

sitsofe avatar Jun 12 '25 15:06 sitsofe

Closing due to lack of reply from reporter. If this issue is still happening with the latest fio (see https://github.com/axboe/fio/releases to find out which version that is) please reopen. Thanks!

sitsofe avatar Jul 21 '25 21:07 sitsofe