PSeitz

Results 317 comments of PSeitz

The allocator will reserve some memory and not release it back to the OS. That's not a memory leak and limited. I wrote something about that here: https://quickwit.io/blog/performance-investigation

This is probably related to directory locking. What `Directory` implementation do you use?

@darashi I think removing the 0 in position seems good, but I'm not sure what would be best in terms of spec. `position` is expressed in number of tokens, but...

Thanks for checking. Unfortunately they emit `positionIncrement`, so it's still unclear what they use for position as absolute value. > If I'm not mistaken, it looks like `startOffset` indicates the...

I think the core issue is here that e.g. a query for `127.0.0.1` with the default tokenizer would produce the tokens`[127,0,0,1]`. To handle this use case, the query should be...

https://github.com/BurntSushi/jiff should be useful for timezone handling

This would make some optimizations easier, e.g. for `(Field1:Term1 OR Field1:Term2) AND (Field2: Term1 OR Field2:Term2)`, it would be better to use a simple union-algorithm that supports fast skips instead...

> I think part of this PR is obsolete. Probably a failed rebase? Yes, a failed rebase. I restarted from scratch instead

Tantivy comes bundled with some tokenizers, but it should be quite easy to implement your own `Tokenizer` via the tokenizer-api (https://crates.io/crates/tantivy-tokenizer-api), or use an lib like `tantivy-stemmers`. Not sure how...

It seems there are two issues 1. Judging from the logs, the aggregation request is sent 50 times. 2. The search thread pool takes all CPUs. This may not leave...