TopFieldCollector mistakenly assumes that all leaves share the same index sort
TopFieldCollector caches whether the search sort is a prefix of the index sort across leaves. While IndexWriter enforces that the whole index has the same index sort, it is possible to create a MultiReader across several indexes which have different index sorts, so this cache is incorrect.
This is an interesting issue, and as such I don't see any good solution, other than removing the cache itself. I am wondering if it is good idea for Collector to know the List<LeafReaderContext>, similar to setWeight for passing Weight to Collector. TopFieldCollector should be able to compute searchSortPartOfIndexSort correctly and use the information within TopFieldLeafCollector.
@jpountz - Any thoughts?
I don't see an obvious solution either. My preference would be to remove the cache and make this decision on a per-segment basis, but this would require moving some methods around, e.g. Comparator#disableSkipping -> LeafComparator#disableSkipping.
Would it make sense to have different collectors for the two cases, one with and one without a cache?
What are the two cases that you have in mind? I don't think that having a collector with a cache makes sense since it has an assumption that leaves are uniform, which may not be correct. However, we could have different LeafCollectors for the case when the search sort is a prefix of the index sort on the one hand, and the case when the sort sort is not a prefix of the index sort on the other hand.
Would it make sense to have different collectors for the two cases, one with and one without a cache?
I'm probably confused. I was thinking in a general way that perhaps we could have a decision that would allow results to be cached (and fetched from cache) only when searching in a context where all readers' leaves shared the same index sort, but I confess I don't have any clear idea how this would be implemented