presto icon indicating copy to clipboard operation
presto copied to clipboard

[Abandoned] Optimize DistinctLimit for small limit values

Open kaikalur opened this issue 2 years ago • 3 comments

We now introduce a notion of "fast limits" for small values of limit (default threshold 10k). As a first optimization, we disable hash generation optimization for simple distinct limit operations so that the exchange won't block and the user will start seeings results as soon as the first value is seen and/or the number of distinct values is less than the limit. This helps in interactive querying use cases.

#17328

Test plan - N/A

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. Don't forget to follow our attribution guidelines for any code copied from other projects.

Fill in the release notes towards the bottom of the PR description. See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* Simple queries like `SELECT DISTINCT c1, c2.. FROM T WHERE .,. LIMIT 1000` will now start streaming results as soon as the first value is available

kaikalur avatar Feb 28 '22 20:02 kaikalur

We will revert/cleanup the previous hash based distinct limit PR once this is merged as that's not helping as much as I expected.

kaikalur avatar Feb 28 '22 20:02 kaikalur

Abandoning this. Trying to find other ways of improving distinct limit :)

kaikalur avatar Mar 13 '22 16:03 kaikalur

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

stale[bot] avatar Sep 21 '22 10:09 stale[bot]