luceneutil icon indicating copy to clipboard operation
luceneutil copied to clipboard

Why does indexing from same source index different number of documents each time?

Open mikemccand opened this issue 4 years ago • 0 comments

I ran six re-indexing benchmarks using wikimediumall, and each time, got a different total number of documents indexed:

[mike@beast3 trunk]$ grep "indexing done" /l/logs/trunk?.txt
/l/logs/trunk1.txt:Indexer: indexing done (89114 msec); total 27624170 docs
/l/logs/trunk2.txt:Indexer: indexing done (89974 msec); total 27624192 docs
/l/logs/trunk3.txt:Indexer: indexing done (90614 msec); total 27624409 docs
[mike@beast3 trunk]$ grep "indexing done" /l/logs/base?.log
/l/logs/base1.log:Indexer: indexing done (89271 msec); total 27623915 docs
/l/logs/base2.log:Indexer: indexing done (91676 msec); total 27624107 docs
/l/logs/base3.log:Indexer: indexing done (93120 msec); total 27624268 docs

Why is that :) It should be exactly the same document count every time!

mikemccand avatar Feb 09 '21 16:02 mikemccand