fio
fio copied to clipboard
`--fsync` (and `--fdatasync`) sync the wrong file when multiple files are in use
Please acknowledge the following before creating a ticket
- [x] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
When using the --fsync/--fdatasync option together with --nrfiles=2 (effectively anything larger than 1) fio will write to file X and issue the fsync to file X+1 which is obviously not what is intended.
This results in a wildly different performance profile between --nrfiles=1 and --nrfiles=2.
Environment: Ubuntu 24.04 / Linux 6.8.0
fio version:
ubuntu@ip-172-31-44-44:~/fio$ ./fio --version
fio-3.40
ubuntu@ip-172-31-44-44:~/fio$ git rev-parse HEAD
ff930c4653ae3952d6b09ab3ec89671aeabf2cbe
Reproduction steps
Example reproducer:
./fio --name=write_iops --directory=/mnt/xfs --size=5G --time_based --runtime=1s --ramp_time=0s --ioengine=libaio --direct=1 --verify=0 --bs=8K --iodepth=1 --nrfiles=2 --rw=write --fdatasync=1
To see what's going on we can trace the sync backend:
sudo perf trace -- ./fio --name=write_iops --directory=/mnt/xfs --size=5G --time_based --runtime=1s --ramp_time=0s --ioengine=sync --direct=1 --verify=0 --bs=8K --iodepth=1 --nrfiles=2 --rw=write --fdatasync=1 --rate=10M
and we will see lots of output like this:
720.442 ( 0.023 ms): fio/1451 write(fd: 6</mnt/xfs/write_iops.0.0>, buf: 0xb2e3b946b000, count: 8192) = 8192
720.467 ( 0.002 ms): fio/1451 fdatasync(fd: 7</mnt/xfs/write_iops.0.1>) = 0
720.471 ( 0.748 ms): fio/1451 clock_nanosleep(rqtp: 0xffffd97b9298) = 0
721.224 ( 0.026 ms): fio/1451 write(fd: 6</mnt/xfs/write_iops.0.0>, buf: 0xb2e3b946b000, count: 8192) = 8192
721.252 ( 0.002 ms): fio/1451 fdatasync(fd: 7</mnt/xfs/write_iops.0.1>) = 0
721.255 ( 0.745 ms): fio/1451 clock_nanosleep(rqtp: 0xffffd97b9298) = 0
722.005 ( 0.024 ms): fio/1451 write(fd: 6</mnt/xfs/write_iops.0.0>, buf: 0xb2e3b946b000, count: 8192) = 8192
722.031 ( 0.002 ms): fio/1451 fdatasync(fd: 7</mnt/xfs/write_iops.0.1>) = 0
which clearly shows the problem. The writes go to fd 6 and the fsyncs go to fd 7.
The issue can be worked around by specifying --file_service_type=roundrobin:2.
I can reproduce the issue here and you can see it happening using --debug=io:
$ ./fio --debug=io --name=write_iops --directory=/tmp --size=8k --nrfiles=2 --rw=write --fdatasync=1
fio: set debug option io
io 2199941 load ioengine psync
write_iops: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.39
Starting 1 process
io 2199959 declare unneeded cache /tmp/write_iops.0.0: 0/4096
io 2199959 fill: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.0
io 2199959 prep: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.0
io 2199959 queue: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.0
io 2199959 complete: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.0
io 2199959 declare unneeded cache /tmp/write_iops.0.1: 0/4096
io 2199959 fill: io_u 0x5f514fc9a440: off=0x0,len=0x0,ddir=4,file=/tmp/write_iops.0.1
io 2199959 prep: io_u 0x5f514fc9a440: off=0x0,len=0x0,ddir=4,file=/tmp/write_iops.0.1
io 2199959 queue: io_u 0x5f514fc9a440: off=0x0,len=0x0,ddir=4,file=/tmp/write_iops.0.1
io 2199959 complete: io_u 0x5f514fc9a440: off=0x0,len=0x0,ddir=4,file=/tmp/write_iops.0.1
io 2199959 io_u 0x5f514fc9a440, failed getting offset
io 2199959 fill: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.1
io 2199959 prep: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.1
io 2199959 queue: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.1
io 2199959 complete: io_u 0x5f514fc9a440: off=0x0,len=0x1000,ddir=1,file=/tmp/write_iops.0.1
io 2199959 close ioengine psync
io 2199959 free ioengine psync
[...]
(writes are ddir=1, the fdatasync is ddir=4)
The problem is fio is treating the fdatasync as another new I/O rather than something to be done for previous I/O. You can even see fio doesn't bother to fdatasync after the final write...
I think this raises the question "What should be fsync'd?". Imagine fsync=2 and due to round robin I've written to two separate files: do I now generate two fsync() calls? Tricky.
I think this raises the question "What should be fsync'd?". Imagine
fsync=2and due to round robin I've written to two separate files: do I now generate two fsync() calls? Tricky.
As a baseline it seems like that with fsync=1 I would (did) expect that the fsync is always to the file which got the preceding write. I think it's reasonable to expect that the semantics with fsync=N are that every Nth write is a "write+fsync" to the same file (consider it one op, really). This gives the expected semantics for fsync=1 and reasonable semantics for other values of N. I don't think it makes sense to fsync every file for fsync > N: this would lead to an unexpectedly higher number of fsync operations.
Whichever option is chosen I guess a note about the interaction in the doc would help too.