io_submit seems to be a blocking call
We run into issues because io_submit is slow. In fact it can be (in rare cases) so slow that a process gets dropped because it stops to heartbeat for too long.
@kaomakino created the small test program below that reproduces the issue. The results will depend on the speed of your disk (which in itself suggests that io_submit is not truly asynchronous).
The program does the following:
- The main thread will write 4KB blocks into random locations within a 4GB file.
- A second thread will cal
fdatasyncevery 3 seconds (this roughly simulates commits in fdb).
Calls to io_submit will then take milliseconds to complete on a fast disk. On a very slow disk, one io_submit can take several seconds (we saw up to 13 seconds).
This can be made better by doing the following:
- Create a 4GiB file (for example with
dd- content doesn't matter) - Change the attached C program so that it doesn't create the file when it opens it.
However, even than io_submit can take a very long time (~1ms on a fast disk). So this seems to be only part of the solution. Additionally, it seems to be the case that latencies of io_submit get better the smaller we make the write queue. With only 1 outstanding request at a time it seems to be 10-60 microseconds.
Further testing seems to indicate that this gets better if we use XFS instead of ext4 - but still not great
@mpilman have you seen https://stackoverflow.com/a/46377629 ?
We saw this happened in TLogs (8.6s slow task) and resulted in recovery.
We saw this happened in TLogs (8.6s slow task) and resulted in recovery.
hello,how do you fix this issue about "io_submit"?
We have not "solved" the issue yet. We have monitoring of slow disks and replace them if we see disk problems.