PSeitz comments

Results 320 comments of


                                            PSeitz

Tokenizer API allocations

It would be nice to remove the `BoxTokenStream` allocation per text and use the Tokenizer directly. e.g. `set_text` on the Tokenizer and then get the tokens from Tokenizer directly

The bullet-point in the feature list `Natural query language` is probably misleading. You can't phrase questions using natural language like that in tantivy (at least not without customizing or additional...

posting lists seems less compact than they could be

Related issue https://github.com/quickwit-oss/tantivy/issues/1041

posting lists seems less compact than they could be

lz4 uses only duplicates for compressing data (no huffmann or ans like zstd) ```bash ➜ blub git:(main) ✗ lz4 datasets/split/346cb77c09e04022aee6c49077dbc821.idx Compressed filename will be: datasets/split/346cb77c09e04022aee6c49077dbc821.idx.lz4 Compressed 183824037 bytes into 147904079...

posting lists seems less compact than they could be

Some more data. Percentage of 4 byte pairs, scanned in 1 byte steps. Interestingly the same pattern (more than 10%) can be observed on `.idx`, but not `.pos` between github...

Support f64 Compression

Yes, having more datasets would be nice. geo-data is probably a little special, since it ideally belongs to an own index that allows geo queries.

Multifields and arrays

You can just add the field multiple times in the `Document`. ```rust index_writer.add_document(doc!( date_field => DateTime::from_timestamp_secs(1000), date_field => DateTime::from_timestamp_secs(1001), ))?; ```

`TokenizerManager` name is a bit misleading

I think we should change that to ```rust pub struct TokenizerManager { tokenizers: ArcSwap, } ``` While TextAnalyzer is actually a `TokenizerBuilder`

Range queries on JSON fields

The parser doesn't handle this currently, but this should work `cart.product_id:

Range queries on JSON fields

Indeed, it's disabled. I don't think there's a inherent reason, except some code missing to handle that. @fulmicoton?