tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

MacOS MmapDirectory sync_all (F_FULLFSYNC) performance

Open appaquet opened this issue 2 years ago • 9 comments

I recently moved my workspace to a new mac with a M1 processor, and we have a few integration tests that require creating Tantivy indices on disk. This ended up being very very slow on MacOS... I traced down the issue to the fact that MmapDirectory calls a sync_all on each file flush, which is unreasonably slow on MacOS. This seems to be a common issue on MacOS and the reason why some softwares expose a flag to disable it altogether (ex: Docker for mac)

I made changes to expose a MmapDirectorySettings (see here) which improved the performance. You can see this gist for details, but creating an index with 100 documents is about 2x slower with sync_all on MacOS. Every call to sync_all takes ~20-30ms on MacOS, while taking ~5ms on a slower Linux machine.

How would you go about that? On our side, we only use MacOS for our dev flow and don't mind the decrease in safety. But I think it would make sense for Tantivy to such settings for the MmapDirectory. Would you like me to open a PR with my changes?

Thanks !

appaquet avatar Nov 30 '21 16:11 appaquet

Generally speaking, I am a tad scared of adding this feature.

With your change the sync should still occur when the File is dropped as believe. The reason why we call sync_all manually is because Drop does not help us detect errors. Do we pay the sync twice today?

One thing that would be quite awesome would be to make a BundleDirectory that works as follows. When a file is opened, we write into an in memory buffer for the first 2MB or so.

If we exceed 10MB, change the mode and write to disk in a normal file. If we call .terminate() on our hybrid FileWriter at a point where the file was still in mem, we stack its data in a .segbundle file.

The segbundle file would get flushed upon commit.

This approach is very interesting but requires to twist the Directory a bit, as we need to leak the concept of segment and commit to the Directory.

Deleting files should also work in that world.

With such a settings, we would end up with 2 flushes for small commit. (one for the segbundle file and one for the meta.json)

fulmicoton avatar Dec 01 '21 06:12 fulmicoton

Just to confirm my understanding. We would stack all small files in the .segbundle and large files are always on its own?

PSeitz avatar Dec 01 '21 11:12 PSeitz

@PSeitz Correct

fulmicoton avatar Dec 01 '21 12:12 fulmicoton

Generally speaking, I am a tad scared of adding this feature.

With your change the sync should still occur when the File is dropped as believe. The reason why we call sync_all manually is because Drop does not help us detect errors.

At the same time, the way I see it, sync on flush would always be enabled, unless explicitly disabled. But I understand that as a library owner, you want to minimize the possibilities of a user shooting himself in the foot and then calling for help. Unfortunately, MacOS fsync performance is just so slow that we hit timeouts in our integration tests (which can't run in memory).

Do we pay the sync twice today?

I would expect so. I tried calling sync_all many times in a row, and the subsequent calls take the same time as the first... This feels counterintuitive since I would expect the OS to keep some kind of dirty flag, and know when a sync really needs to be done.

One thing that would be quite awesome would be to make a BundleDirectory that works as follows. When a file is opened, we write into an in memory buffer for the first 2MB or so.

If we exceed 10MB, change the mode and write to disk in a normal file. If we call .terminate() on our hybrid FileWriter at a point where the file was still in mem, we stack its data in a .segbundle file.

The segbundle file would get flushed upon commit.

This approach is very interesting but requires to twist the Directory a bit, as we need to leak the concept of segment and commit to the Directory.

Deleting files should also work in that world.

With such a settings, we would end up with 2 flushes for small commit. (one for the segbundle file and one for the meta.json)

I do agree it's a better approach long term since it would probably improve performance across the board for small indices. We'll have to live on our branch meantime then...

appaquet avatar Dec 01 '21 16:12 appaquet

I would expect so. I tried calling sync_all many times in a row, and the subsequent calls take the same time as the first... This feels counterintuitive since I would expect the OS to keep some kind of dirty flag, and know when a sync really needs to be done.

Let me open an issue to investigate how many sync we do.

I do agree it's a better approach long term since it would probably improve performance across the board for small indices. We'll have to live on our branch meantime then...

In the meanwhile, you do no really need to maintain a fork. You can just have a custom mmap directory that does not flush and plug it to normal tantivy.

fulmicoton avatar Dec 02 '21 00:12 fulmicoton

Alright. Let me know if you need my help testing anything on mac. I'm on Discord too if you want to ping me there.

appaquet avatar Dec 02 '21 01:12 appaquet

Some work was done in c3cc934. It:

the fixes the bug by adding some sync replaces all fsync by fdatasync removes some fsyncs. On linux on a in tiny commit, it does not make much difference, but maybe it helps on MacOS.

I opened a thread on discord.

fulmicoton avatar Dec 02 '21 04:12 fulmicoton

Good thread on the fsync problem on MacOS here: https://twitter.com/marcan42/status/1494213855387734019?s=21

appaquet avatar Feb 17 '22 13:02 appaquet

Oh yes I spotted that too!

fulmicoton avatar Feb 17 '22 15:02 fulmicoton