tantivy
tantivy copied to clipboard
Improve SSTable format... or find another dictionary format?
inverse vint implementation. ~~remove serde_cbor.~~ removed in #1943 add multilevel indexing?
Hi @fulmicoton !
may I naively ask why the sstable couldn't actually be implemented as Vec<Fst> rather than Vec<Block>?
it seemed like fst was great for a local tantivy but a problem with quickwit since you need to download the entire dictionary.
so theoretically, could the sstable be implemented from multiple fsts each contains subset of the keys range?
Yes you are correct. Having a bunch of fst blocks would solve the IO problem too.
Another reason we picked sstable is because iterating through them is much faster and we originally wanted to build analytics based on this.
I see, so fsts would have better get performance where the sstable blocks would have better iteration performance where the get op is used to for the usual search queries and iteration for analytics?
In that case, may I ask how worse the performance of get with sstable blocks compared to fst? would it really affect queries that much?