cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

[DNM] xform: sort-avoiding unconstrained non-covering index scan

Open msirek opened this issue 2 years ago • 8 comments

Fixes https://github.com/cockroachlabs/support/issues/1453 and #77243

Previously, the optimizer did not explore plans which scan a non-covering index with no constraints, even if the index can provide a required ordering. This can prevent an optimal plan from being found which avoids a sort, or in the case of ORDER BY ... LIMIT, avoids a full scan.

Now, when optimizing an enforcer of required physical properties, we save an ordering hint in the memo. When generating index scans for
exploration, we no longer skip non-covering unconstrained index scans if that index can provide the required ordering.

Release note (performance improvement): This patch adds support for unconstrained scans of non-covering indexes when the index can be used to avoid a sort.

msirek avatar Sep 13 '22 04:09 msirek

This change is Reviewable

cockroach-teamcity avatar Sep 13 '22 04:09 cockroach-teamcity

Related to / maybe fixes #77243

DrewKimball avatar Sep 13 '22 07:09 DrewKimball

Since this is a performance improvement, let's wait to backport until 22.2.1. I haven't review this yet, but I'll take a look in the next day or so.

mgartner avatar Sep 13 '22 18:09 mgartner

It looks like groupStateKey should be updated with an OrderingChoice hint

Would it be necessary to update it at all? My reading is that the state should already have the required properties including the ordering, so if we had access to it in GenerateIndexJoins we could just use that - maybe just cache the current state on the explorer. Good find, by the way.

DrewKimball avatar Sep 14 '22 01:09 DrewKimball

The current state would hold an empty required ordering.

Ok, makes sense. I think your idea to have it on the memo is best then, since we'd have to do some work to have access to the group state in exploration rules anyway, and the memo is already there. I think it's still necessary to change where we're setting the EnforcerProps/ParentProps field because of the recursion in optimizeGroupMember: https://github.com/cockroachdb/cockroach/blob/95677eb5f8d006629b16024fb7d87d55344c1470/pkg/sql/opt/xform/optimizer.go#L570 There might be other places that work, but optimizeGroup seems like a good place to do this. It's the place where the exploration rules are called, so we can be certain there won't be a case where we recursively optimize a child but don't set its EnforcerProps and it sees those of its parent. optimizeGroup is called with the non-empty required props, so we shouldn't have any trouble there.

DrewKimball avatar Sep 14 '22 03:09 DrewKimball

Note to self: possible updates in branch orderingHint_backup

msirek avatar Sep 16 '22 19:09 msirek

If there was no Limit, then that's a full table scan, so it's not a case that's ever going to be very fast. But even if there's some real-world case for that type of query, a full scan over the primary index plus a sort is likely going to be faster than a full scan over a secondary index plus an index join that performs a lookup for each row in the table.

mgartner avatar Sep 21 '22 21:09 mgartner

If there was no Limit, then that's a full table scan, so it's not a case that's ever going to be very fast. But even if there's some real-world case for that type of query, a full scan over the primary index plus a sort is likely going to be faster than a full scan over a secondary index plus an index join that performs a lookup for each row in the table.

Hmm, I'm not so sure, if its a lot of data I could see sort that had to spill to disk be worse than all those lookup joins. I'm also not sure how these plans play out in the distsql realm, like do we distribute the sort and intelligently buffer things so that we get high throughput and don't oom? And if we went with the index are the lookup joins distributed and if so do we have routers intelligent enough to synchronize the outputs without adding sorting? Might have to play around with this on Friday...

The other variable is if there's a filter that can be pushed down to the index scan how selective is it? Because a selective constrained index scan will surely take the cake regardless of whether theres a limit, and if its not very selective the full table scan could be better. Maybe if there's a LIMIT and an ordering OR an ordering and a filter that should be our trigger to explore non-covering indexes?

cucaroach avatar Sep 21 '22 23:09 cucaroach

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Mark Sirek seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

cockroach-teamcity avatar Dec 22 '23 08:12 cockroach-teamcity