Paul Masurel
Paul Masurel
Hello @inboxsphere The problem is not coming from the handling of the NOT operator, but is coming from a bad interaction with the ngram tokenizer. The ngram tokenizer is used...
I won't change the license to zlib (because I don't have time to research the hidden implications) but I added MIT as an alternative.
MAIN_BRUNCH :)
@Barre is there a rationale to having such gigantic segments? We recommend around 10millions docs per segment.
> Does that mean my current index is "toasted" and I should basically reindex? (While taking care of not having these larger segments). I'm afraid yes
> @PSeitz Would a reproduction be useful? I've been thinking about generating a 1B docs segment from a minimal repo to see how things goes. Thanks to the stack trace...
> In my case, 10M would probably mean too many segments, and the compression ratio wouldn't be as good. I don't think this is true. > I just indexed with...
No... This is not it. Can you share the entire main?
still nothing special in there.
To get segments that large, you should have overridden the default merge policy, or merged index on your own. You don't have code doing this?