Infinite 'msec_to_next_event' when executing fio on 4TB device on MacOS 12.7.4
As topic, fio not finish execution after 3 days. It seem fio stucks after truncate file on device?
Environment Device : M1 MacPro OS : MacOS Monterey 12.7.4
FIO Version fio 3.39
FIO Command
fio script/seq-read_1024k_qd32_4000g.fiocfg
--directory=/Volumes/DiskTest
--output-format=json
--output=output/xxx.json
--eta=always
--write_bw_log=output/xxx
--debug=all
FIO Setting
[seq-read_1024k_qd32_4000g]
ioengine=${IOENGINE}
iodepth=32
size=3600g
direct=1
runtime=7200s
ramp_time=5s
thread=1
nrfiles=1
log_avg_msec=1000
bs=1024k
rw=read
Log seq-read_1024k_qd32_4000g.20250915.170221.json
ps aux | grep fio
Hello @yh-yong,
- What is
/Volumes/DiskTest? Is it some sort of network share such that the preallocation fails and fio falls back to truncation? - Does this happen on the Mac's internal disk?
- Does this problem happen every time the file has to be made?
- Does the problem happen with a slightly smaller file sizes (e.g. 3500g, 3400g, 3000g etc)?
- Which ioengine is being used? Does the problem also happen with other ioengines (e.g. psync)?
- Can you minimise the job file and command line options (it's important to know them all) such that you have the smallest amount that still reproduce the issue. Don't stop at the first option that is required, put it back and then try to remove the next option and so on.
- Does the problem happen on the latest stable version of macOS (15.6.1 at the time of writing)?
This ticket looks to be identical to https://github.com/axboe/fio/issues/1905 so in addition to everything asked above, every question in that other issue will also need answering here.
Hello @yh-yong,
- What is
/Volumes/DiskTest? Is it some sort of network share such that the preallocation fails and fio falls back to truncation?- Does this happen on the Mac's internal disk?
- Does this problem happen every time the file has to be made?
- Does the problem happen with a slightly smaller file sizes (e.g. ~~3900g, 3800g, 3500g~~ 3500g, 3400g, 3000g etc)?
- Which ioengine is being used? Does the problem also happen with other ioengines (e.g. psync)?
- Can you minimise the job file and command line options (it's important to know them all) such that you have the smallest amount that still reproduce the issue. Don't stop at the first option that is required, put it back and then try to remove the next option and so on.
- Does the problem happen on the latest stable version of macOS (15.6.1 at the time of writing)?
This ticket looks to be identical to #1905 so in addition to everything asked above, every question in that other issue will also need answering here.
Hi @sitsofe ,
- /Volumes/DiskTest is a removable USB4 drive.
- It seem no.
- Yes, this issue happen every time we execute.
- We can tried lower the file size and provide feedback later.
- posixaio ioengine being used in this job and we never tried other ioengine.
- Sometime the job can execute successfully (with debug log enable) and sometime no. We can tried remove some option see whether the job able to execute succesfully.
- We will tried on latest macOS and provide feedback later.
Hi @yh-yong:
- And I'm guessing formatted with exFAT or was it some other filesystem?
- Can you try with
psyncioengine and report if the problem is still reproducible. - As stated, we are looking for the Minimal Reproducible Example (see https://stackoverflow.com/help/mcve for a code focussed description). Knowing all the options that can be removed can help to reach a conclusion.
Don't forget we're also looking for answers to the questions mentioned in https://github.com/axboe/fio/issues/1905#issuecomment-2915190442 too.
Hi @yh-yong:
- And I'm guessing formatted with exFAT or was it some other filesystem?
- Can you try with
psyncioengine and report if the problem is still reproducible.- As stated, we are looking for the Minimal Reproducible Example (see https://stackoverflow.com/help/mcve for a code focussed description). Knowing all the options that can be removed can help to reach a conclusion.
Don't forget we're also looking for answers to the questions mentioned in #1905 (comment) too.
Hi @sitsofe,
- Yes, Device formated in exFAT.
- I tried psync and posixaio without command line and it execute successfully. In addition, test with command line is on-going. Will update execution status later.
Hi @sitsofe,
below are my observation based on the test done.
-
'stuck' issue seem happened when native fallocate unsuccessful on macOS 12.7.4.
-
using the same job cfg file (posixaio or psync as ioengine, with/without command line), i'm getting different fallocate result on different macOS. native fallocate unsuccess on macOS 12.7.4 but success on macOS 15.5. For more information kindly refer debug log attached.
macOS_15.5_seq-read_1024k_qd32_4000g.20250919.135805.json macOS_12.7.4_seq-read_1024k_qd32_4000gfiocfg.20250919.135052.json
@yh-yong: Since both the posixaio and psync ioengines are file based, all fallocate preallocation happens before the engines are started so I would expect that part to behave the same way on both. The fact the later macOS can do a successful fallocate suggests that some behaviour was fixed in macOS along the way. In terms of further things to check and extra questions to answer:
- Does macOS 15.5 go on to successfully start reading the file after some period of waiting or does it too hang for some long period of time?
- You can try and test the theory that it's linked to fallocate failing by deleting any existing files and then setting
fallocate=noneand seeing if the outcome changes. Perhaps what you're seeing has something to do with sparse files? - You could also try finding a filesystem that macOS can't preallocate on (perhaps an SMB network share?) and seeing if the problem reproduces there?
- It may also be interesting to know if you get the same behaviour if you pre-create a sparse file by running
truncate -s 3600G /Volumes/DiskTest/fio.tmpand then using--filename=/Volumes/DiskTest/fio.tmpinstead of--directoryin your fio run. - Roughly how long does it take to fully run the following job on both macOS versions and what bandwidth does it settle down to?
fio --name=fill --rw=write --bs=1m --filename=/Volumes/DiskTest/fio.tmp --size=3600g
(Please don't forget we're waiting for answers to all the other questions previously asked and in those in the other ticket too)
@yh-yong: Since both the posixaio and psync ioengines are file based, all fallocate preallocation happens before the engines are started so I would expect that part to behave the same way on both. The fact the later macOS can do a successful fallocate suggests that some behaviour was fixed in macOS along the way. In terms of further things to check and extra questions to answer:
- Does macOS 15.5 go on to successfully start reading the file after some period of waiting or does it too hang for some long period of time?
- You can try and test the theory that it's linked to fallocate failing by deleting any existing files and then setting
fallocate=noneand seeing if the outcome changes. Perhaps what you're seeing has something to do with sparse files?- You could also try finding a filesystem that macOS can't preallocate on (perhaps an SMB network share?) and seeing if the problem reproduces there?
- It may also be interesting to know if you get the same behaviour if you pre-create a sparse file by running
truncate -s 3600G /Volumes/DiskTest/fio.tmpand then using--filename=/Volumes/DiskTest/fio.tmpinstead of--directoryin your fio run.- Roughly how long does it take to fully run the following job on both macOS versions and what bandwidth does it settle down to?
fio --name=fill --rw=write --bs=1m --filename=/Volumes/DiskTest/fio.tmp --size=3600g(Please don't forget we're waiting for answers to all the other questions previously asked and in those in the other ticket too)
@sitsofe
- Yes, 15.5 start reading the file after some period of waiting.
- still getting 'stuck' result even setting fallocate=none using the same fiocfg file and command line.
- will provide test result later
- the job execute read action instantly without waiting period of time
- will provide test result later
response for #1905
- Does the problem happen with a smaller bs? What happens with 4k? 64k? 128k? Getting same 'stuck' result
- Does the problem happen doing I/O to your main internal macOS volume? macOS volume only have 240GB so unable to replicate this issue.
@yh-yong OK I think we can safely say the problem is unrelated to fallocate/preallocation. I think the answer to 11. is pointing to what the "problem" is but let's see what the answer is answer is to 12. Can you run 12. with and without --direct=1 added to it?
- Can you run the following job and report the results:
rm -f /Volumes/DiskTest/fio.tmp
fio --name=readtest --filename=/Volumes/DiskTest/fio.tmp --size=3600g --direct=1 --rw=read
@yh-yong OK I think we can safely say the problem is unrelated to fallocate/preallocation. I think the answer to 11. is pointing to what the "problem" is but let's see what the answer is answer is to 12. Can you run 12. with and without
--direct=1added to it?
- Can you run the following job and report the results:
rm -f /Volumes/DiskTest/fio.tmp fio --name=readtest --filename=/Volumes/DiskTest/fio.tmp --size=3600g --direct=1 --rw=read
@sitsofe
- Job took around 30min to finish. please refer test results below.
macOS 15.5 (without direct = 1 )
macOS 12.7.4 (without direct = 1 )
macOS 15.5 (with direct = 1)
macOS 12.7.4 (with direct = 1)
- job still getting 'stuck' result on macOS 12.7.4 macOS 15.5
@yh-yong:
Hmm OK that's unexpected - it looks like the disk is fast enough and what's stranger is the problem didn't happen with the cut down line I asked you to try. The only thing left that I can think to progress this is my original request:
Can you minimise the job file and command line options (it's important to know them all) such that you have the smallest amount that still reproduce the issue. Don't stop at the first option that is required, put it back and then try to remove the next option and so on.
Without that work I'm afraid this isn't going to go any further. At best at the moment all I can say if the problem doesn't happen on later macOS is "perhaps it's a bug that Apple fixed in a macOS after 12.7".