tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Improve SSTable format... or find another dictionary format?

Open fulmicoton opened this issue 2 years ago • 3 comments

inverse vint implementation. ~~remove serde_cbor.~~ removed in #1943 add multilevel indexing?

fulmicoton avatar Dec 23 '22 00:12 fulmicoton

Hi @fulmicoton ! may I naively ask why the sstable couldn't actually be implemented as Vec<Fst> rather than Vec<Block>? it seemed like fst was great for a local tantivy but a problem with quickwit since you need to download the entire dictionary. so theoretically, could the sstable be implemented from multiple fsts each contains subset of the keys range?

oronsh avatar Dec 23 '22 16:12 oronsh

Yes you are correct. Having a bunch of fst blocks would solve the IO problem too.

Another reason we picked sstable is because iterating through them is much faster and we originally wanted to build analytics based on this.

fulmicoton avatar Dec 24 '22 01:12 fulmicoton

I see, so fsts would have better get performance where the sstable blocks would have better iteration performance where the get op is used to for the usual search queries and iteration for analytics? In that case, may I ask how worse the performance of get with sstable blocks compared to fst? would it really affect queries that much?

oronsh avatar Dec 24 '22 09:12 oronsh