PSeitz

Results 319 comments of PSeitz

Unregister on drop should be done. Having an interface to unregister could be done when there's a use case for it.

Do you have a reference for a ngram tokenizer that ends the ngram on whitespace? `RemoveLongFilter::limit(40)` doesn't make sense, a ngram token will never have that length. `.filter(Stemmer::new(Language::English))` will give...

I meant a reference that does the tokenization in `September`, `October` you suggested. > I am just not sure how to get there. I'm not sure TextAnalzyer can do that...

SimpleTokenizer is not an ngram tokenizer

Thanks for the report. Can you provide something to reproduce that behavior in rust? I'm not very familiar with python Do you reload your `IndexReader` after committing? It may have...

Shouldn't the reload be triggered sync?

@GentBinaku Thanks, as a starting point the callback in the RamDirectory is https://github.com/quickwit-oss/tantivy/blob/main/src/directory/ram_directory.rs#L234, which updates the `IndexReader`. Ideally the update would be done before returning from `commit()`. If that's not...

Currently you can either: 1. Create multiple indices 2. Create a schema that contains all the fields (not great if there is a type conflict) 3. Move your data into...

What do you mean with "flatten Json"? We have the JSON type which reflects dynamic typing on a field, and there's the idea to have a global flag on the...

> Could you elaborate a little bit more on how this would work then? The current design proposal is here: https://github.com/quickwit-oss/tantivy/issues/2215