PSeitz comments

Results 320 comments of


                                            PSeitz

Questions about how to implement exact text match search.

You need to add the [StopWordFilter](https://docs.rs/tantivy/0.21.1/tantivy/tokenizer/struct.StopWordFilter.html) to your tokenizer

Document Score

I don't think this is properly documented

Issue 1787 extended stats

> This is just a ping. I'm working on it, I created a dataset and a simple application that uses several gb of memory for computing some statistics. I'm using...

Issue 1787 extended stats

> Hello, I'd like to know how to proceed with this pull request, it has been pending for a while. Sorry I forgot about this PR. It's fine to ping...

Issue 1787 extended stats

> Merge is done, do you have an example for benchmark that also reports memory consumption? I ran `cargo bench` but saw no indication of memory consumption. Thanks! memory reporting...

Issue 1787 extended stats

https://github.com/quickwit-oss/tantivy/pull/2378 is now merged

Issue 1787 extended stats

The memory consumption increased by 10% for those two benchmarks. I don't think you should pay for extended stats if you are not using it ``` terms_many_with_avg_sub_agg Memory: 29.0 MB...

Indexing an array of small pieces of text is much slower than indexing a big string

here's a diff on the profile. There's some noise, but we can see the alloc overhead for the array (array itself + multiple strings), slower deserialization, slower value_from_json. `9.61% +4.74%...

Indexing an array of small pieces of text is much slower than indexing a big string

- We could avoid the `Document` cfree overhead, by just referencing the unserialized json text blocks (similar to [serde_json_borrow](https://github.com/PSeitz/serde_json_borrow)). It would either require lifetimes on the tantivy's `Value` though. -...

Indexing an array of small pieces of text is much slower than indexing a big string

https://github.com/quickwit-oss/tantivy/pull/2062 removes the `Token::default` creation for each text encountered, by having a shared one on the tokenizer ``` index-hdfs/index-hdfs-no-commit time: [644.80 ms 648.41 ms 652.51 ms] change: [-4.1369% -2.7259% -1.3721%]...