solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-14764: FacetFieldProcessorByArray performance parity between `sort:"index asc"` and `sort:"index desc"`

Open magibney opened this issue 3 years ago • 4 comments

See: SOLR-14764

Particularly for high-recall, high-cardinality cases, sort:"index desc" currently performs far less well than sort:"index asc" (passing every term through a priority queue as if it were a FIFO queue, but with lots of unnecessary comparisons).

This PR addresses that issue. Essentially: we can't really afford to ignore/abstract the fact that there is a direct connection between native ord sort order and order of iteration in building the priority queue for the sort:"index [asc|desc]" case.

magibney avatar Feb 02 '22 03:02 magibney

Is there anyway to have a performance (benchmark?) test that demonstrates the performance hit?

epugh avatar Feb 02 '22 11:02 epugh

On a naive/toy example (1,000,000 docs, terms facet on unique id field -- so, field cardinality of 1,000,000), core in tmpfs, comparing q=*:*&rows=0&json.facet={blah:{type:terms,field:id,sort:"index desc"}} against the same with index asc, current main branch lowest latency for asc is 20ms, for desc is 48ms.

With this optimization, both are down to 20ms (actually as low as 17ms, but that could just be noise).

It's super-simple to replicate the existing behavior on any current index: just find a field that's relatively high cardinality (1m+?) and facet on that, comparing sort:"index asc" with sort:"index desc". I don't readily have access to "real" cores/collections that demonstrate current stock behavior (all the instances I manage have been running with a variant of this patch for well over a year).

magibney avatar Feb 02 '22 16:02 magibney

interesting! Honestly, I think my question really goes to the desire that we have as a community about having benchmarks in general, and doesn't have anything specific to this JIRA in retrospect. LGTM!

epugh avatar Feb 02 '22 16:02 epugh

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions[bot] avatar Feb 22 '24 00:02 github-actions[bot]