qlever
qlever copied to clipboard
Distinct in Subqueries leads to unnecessary sort operations
If we have a subquery with a DISTINCT
clause and the result of this subquery is reordered afterwards (because of joins or an order-By clause,
the query planner will often generate the following operations: (TODO: example)
SORT-A (by all columns to enable distict) -> DISTINCT -> SORT-B (according to the way the result is used afterwards).
If the sort oder of the columns of SORT-A would be chosen correctly, the SORT-B operation could be dropped.
There are several ways to achieve this:
-
Let the query planner create all possible permutations of sort columns for SORT-A, then the query planner would favor the one(s), that don't require the SORT-B operation. Unfortunately, this would require n! different entries for SORT-A where n is the number of columns. On the other hand it would be relatively straightforward to implement.
-
Make the query planner aware of the Sort order required by the SORT-B operation when creating the SORT-A + distinct. This would be the cleanest approach, but would require quite some changes to the current bottom up DP approach.
-
Let the query planner create the plans as it does at the moments, and afterwards check for DISTINCT operations with sorts before and after them, and rewire SORT-A. This should be relatively straightforward and solve the issue at Hand. However this might lead to nonoptimal join orders up the Execution tree.
-
Change the optimizer, s.T. it can assume that after a DISTINCT the result is sorted in any order it wants. (this is generally true, the special case that the SORT-A operation is not needed because the input to the DISTINCT operation is already ordered is rare and can easily be detected). The we get optimal query plans and afterwards insert the correct sort operation before the DISTINCT.