lucene
lucene copied to clipboard
A little optimization about BKDReader
Description
1.Reduce the match times for BKD node whose relation=CELL_CROSSES_QUERY.
When performing range query, if the Relation is CELL_CROSSES_QUERY, we need to traverse each value in this leaf node and matches. A segment query typically involves two leaf nodes with Relation set to CELL_CROSSES_QUERY, leading to 512×2 matches.
In reality, the 512 values on each BKD leaf node are stored in a compressed format, such as "30 consecutive 1, 30 consecutive 2, 30 consecutive 3.....". Notably, matching for 30 consecutive 1 only requires 1 times, instead of 30. With this optimization, the BKD tree only needs a few matching rather than 512 per leaf.
For a shard with 25 segments, the original approach would involve 25×512×2 = 25,600 matching operations. Through the above optimization, however, a lot of reduction in matching can be achieved.
https://github.com/apache/lucene/blob/3c118d74d0bc1bc194272774605c1c1e57232925/lucene/core/src/java/org/apache/lucene/util/bkd/BKDReader.java#L911
......
scratchIterator.reset(i, length);
if (visitor.compare(scratchPackedValue, scratchPackedValue) == PointValues.Relation.CELL_INSIDE_QUERY) {
visitor.visit(new IntsRef(scratchIterator.docIDs, i, length));
}
......
2.When calling PointValues.intersect(), if there is no matching node left, we should exit promptly.
For example, in the following case, only leaf node 7 matches. In reality, we would visit nodes from 1 to 13.
- When visiting leaf node 7: relation = CELL_CROSSES_QUERY
- When visiting leaf node 8: relation = CELL_OUTSIDE_QUERY
Once the relation changes to CELL_OUTSIDE_QUERY in node 8, we should exit immediately instead of continuing to visit nodes 9 to 13.