luwak icon indicating copy to clipboard operation
luwak copied to clipboard

The lack of hermetics in DocumentBatch

Open SOLR4189 opened this issue 6 years ago • 1 comments

Hi, I have an another problem: When I passed my docs in batches (3000 docs in batch) through Monitor I don't get all matching pairs. When I passed my docs in batches with one doc per batch I get all results. What can it be? Has LUWAK batch size limit? I didn't found...

I'm using ParallelMatcher with SimpleMatcher inside (score doesn't matter for me), in monitor loaded one query only.

SOLR4189 avatar Jun 21 '18 04:06 SOLR4189

Ok. I found a problem. The problem is that DocumentBatch gets analyzers from first document in batch only (line 187 in DocumentBatch.java). So, it will failed in the case when another doc in batch has fields that first doc doesn't have.

Temp solution: when I build batch, I collect all analyzers from all docs in batch, so each doc in batch will get all possible analyzers for all possible fields (even those that it doesn't have)

Optimal solution: DocumentBatch must union all analyzers itself. What do you think?

SOLR4189 avatar Jun 24 '18 05:06 SOLR4189