solr
solr copied to clipboard
SOLR-15760: Improve the default distributed facet overrequest function/heuristic
See: SOLR-15760
For logical consistency, distributed facet overrequest should make no distinction between offset and limit; instead, distributed overrequest should be calculated as a function of the sum of offset+limit, boosting relatively heavily when few values are requested, and decaying asymptotically to f(x)=x for larger numbers of requested values.
The condition on small offset was added in 1f7777693769bad1cd8fc40b339d00c43f16f9d1, and I think in a pinch this could stand to simply be removed (to restore the initial unconditional linear overrequest boost according to the overrequest function f(x)=1.1*x + 4, initially introduced* in 7b5df8a10391f5b824e8ea1793917ff60b64b8a8).
*EDIT: I misidentified the initial introduction of the f(x)=1.1*x + 4; it's been refactored around a few times, but was already present in the commit that introduced JSON Facets: 3dc5ed33c5f13309c22716c7d18b726d8a093622
@dsmiley, could you take a quick look at this? I think there's some sense to the assertion that larger numbers of requested values actually need less overrequest (though I still don't think the distinction between offset and limit is relevant). Intuitively I'd also think that perhaps we'd want to boost extremely low gross-limit requests by more than 4 (the proposal currently in this PR has a floor of 12 for shard requests).
Ultimately I think there a three options that might make sense:
- revert the overrequest function change from 1f7777693769bad1cd8fc40b339d00c43f16f9d1, going back to the original, unconditional linear boost
- change to a function something like what's proposed in this PR, which decays for larger gross limits
- switch to a static additive boost (+12 or something) that would help small requests, and "decay" (relative to the overall requested size) for higher gross limits.
I like it -- especially treating offset+limit the same as limit, and for monotonically increasing the over request to a point. but I'm not the ideal reviewer. @yonik , @joel-bernstein and @hossman come to mind.
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!