rucene
rucene copied to clipboard
Index too large
The search benchmark consists in indexing all docs in wikipedia en. To level the field, we merge all segments down to a single segment.
I was happy to see that rucene also implemented force_merge
with the blocking option.
Unfortunately after the merge finish, I end up with an index of 24 GB. (Tantivy and Lucene both end up with an index of 3GB.)
Apologies: I found one of the problem : I was indexing with term vectors.information! I'll reindex and report here if it solves the problem or not
Correction 6.6GB.
This is a bit more than twice the size I would have expected. I think the files that were before the merged are simply not deleted.
Hi Paul, Thanks for reporting issues, We will try fixing these issues and let you know when we are done
I am mostly blocked on issue #3
@tongjianlin Can you double check if we return from blocking force_merge before old segments getting reclaimed?