pallet icon indicating copy to clipboard operation
pallet copied to clipboard

Too many commits

Open fulmicoton opened this issue 5 years ago • 2 comments

Tantivy is not meant to commit after every document.

It will get a tad better in the next version of tantivy but still not a viable solution. Ideally pallet should accept a short lag (100ms) between to moment a document is an inserted and the moment when it is available for search.

fulmicoton avatar Mar 08 '20 07:03 fulmicoton

@fulmicoton Good to know; thanks for the issue!

Is the issue commit-ing frequently, or not re-using the IndexWriter? It doesn't look like calling commit itself is too costly.

For example, if I were to put a global IndexWriter into a Mutex<IndexWriter> on the pallet::search::Index object, and then used that in the database operations (e.g. create/update) calling commit at the end of each), would that resolve the issue?

Also, it looks like calling IndexWriter::commit joins on all the worker threads, so it shouldn't be necessary to wait for a document to be available, should it?

I'd like to be able to provide some level of consistency between the database and the index, by calling commit inside the sled transactions in the database operations, but I'll need to look into this more.

Thanks for your help!

kardeiz avatar Mar 10 '20 18:03 kardeiz

by calling commit inside the sled transactions in the database operations

That means you've reduced sled write performance to the level of Tantivy, which is designed for large batch updates, not quick commits. That's not ideal, it'd be better to run Tantivy in "catch-up mode" where in practice newly-added data is found by search, but updates are not delayed to make that a guarantee.

Here's a blog post from the author of Tantivy (EDIT: who I now realize is the person who created this issue!) that talks about what happens with small commits: https://fulmicoton.com/posts/behold-tantivy-part2/

tv42 avatar Jul 15 '21 20:07 tv42