Pl3xMap
Pl3xMap copied to clipboard
Networked disk writes can cause server TPS dips
I'm storing the web tiles on a Ceph cluster over the network, and it's mostly good, but every once in a while there's a delay when writing/renaming a file. Because these operations are run synchronously with the main thread, it can cause severe dips in the TPS while waiting for the file write. I'm not sure if there's a clean way to decouple the IO operations from the main thread, but I would definitely appreciate it if that could be done.
I was able to capture it here by filtering for slow ticks. As you can see, in this profile, the java.nio.file.Files.move() operation took 1116ms, even though during normal operation it only takes 0.2ms