Ankit Jain

Results 109 comments of Ankit Jain

Thanks all for reviewing this PR. Planning to merge this PR by tomorrow, if there is no new feedback. Again, thanks for helping improve this change with your inputs!

Thanks @dungba88 for trying this out. > I don't see its Thread object is being used anywhere, can it be removed? The `Thread` object is used for maintaining the association...

> My mental model is that this collector works on doc values in the default case, and can opportunistically take advantage of index statistics or points indexes when it makes...

Reran the benchmark, just to ensure nothing is regressed, and collector works as expected: ``` Benchmark (bucketWidth) (docCount) (pointEnabled) Mode Cnt Score Error Units HistogramCollectorBenchmark.pointRangeQueryHistogram 5000 500000 true thrpt 3...

@jpountz - As per the suggestion, I have changed the PR to disable `HistogramCollection` instead if docValues is not indexed. Please let me know if there is any other feedback.

Thanks @dzane17 & @Naarcha-AWS for putting this together. Looks great, just some minor comments!

Addressed review comments, and benchmark results below: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value IntSet 437.40 (12.5%) 420.59 (7.3%) -3.8% ( -21% - 18%) 0.235 BrowseMonthTaxoFacets 2.55 (7.1%)...

> On your benchmark run, I see no query with a speedup and a low p-value? I primarily wanted to ensure there is no regression. Do we have any benchmark...

While I can see the concerns with high dimensional data, I am wondering if this can be a good improvement for [PointTreeBulkCollector](https://github.com/apache/lucene/blob/main/lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/PointTreeBulkCollector.java#L55), as it is limited to single dimensional fields....