cloudberry
cloudberry copied to clipboard
Optimize DISTINCT, ORDER BY and DISTINCT ON when Aggregation without Group By.
For query which has Aggregation but without Group by clause, the DISTINCT/DISTINCT ON/ORDER BY clause could be removed as there would be one row returned at most. And there is no necessary to do unique or sort. This can simply the plan, and process less expressions like: Aggref nodes during planner.
DISTINCT
explain(verbose, costs off)
select distinct count(a), sum(b) from t_distinct_sort ;
QUERY PLAN
------------------------------------------------------------------------
Unique
Output: (count(a)), (sum(b))
Group Key: (count(a)), (sum(b))
-> Sort
Output: (count(a)), (sum(b))
Sort Key: (count(t_distinct_sort.a)), (sum(t_distinct_sort.b))
-> Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Settings: optimizer = 'off'
Optimizer: Postgres query optimizer
(16 rows)
After this commit:
explain(verbose, costs off)
select distinct count(a), sum(b) from t_distinct_sort ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Optimizer: Postgres query optimizer
(10 rows)
DISTINCT ON and ORDER BY
select distinct on(count(b), count(c)) count(a), sum(b) from t_distinct_sort order by count(c);
QUERY PLAN
--------------------------------------------------------------------
Unique
Output: (count(a)), (sum(b)), (count(c)), (count(b))
Group Key: (count(c)), (count(b))
-> Sort
Output: (count(a)), (sum(b)), (count(c)), (count(b))
Sort Key: (count(t_distinct_sort.c)),
(count(t_distinct_sort.b))
-> Finalize Aggregate
Output: count(a), sum(b), count(c), count(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b)),
(PARTIAL count(c)), (PARTIAL count(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b),
PARTIAL count(c), PARTIAL count(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
After this commit:
select distinct on(count(b), count(c)) count(a), sum(b) from t_distinct_sort order by count(c);
QUERY PLAN
--------------------------------------------------------
Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Optimizer: Postgres query optimizer
ORDER BY
explain(verbose, costs off)
select count(a), sum(b) from t_distinct_sort order by sum(a), count(c);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Sort
Output: (count(a)), (sum(b)), (sum(a)), (count(c))
Sort Key: (sum(t_distinct_sort.a)), (count(t_distinct_sort.c))
-> Finalize Aggregate
Output: count(a), sum(b), sum(a), count(c)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b)), (PARTIAL sum(a)), (PARTIAL count(c))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b), PARTIAL sum(a), PARTIAL count(c)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Settings: optimizer = 'off'
Optimizer: Postgres query optimizer
(13 rows)
After this commit:
explain(verbose, costs off)
select count(a), sum(b) from t_distinct_sort order by sum(a), count(c);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Optimizer: Postgres query optimizer
(10 rows)
DISTINCT and ORDER BY
select distinct count(a), sum(b) from t_distinct_sort order by sum(b), count(a);
QUERY PLAN
------------------------------------------------------------------------
Unique
Output: (count(a)), (sum(b))
Group Key: (sum(b)), (count(a))
-> Sort
Output: (count(a)), (sum(b))
Sort Key: (sum(t_distinct_sort.b)), (count(t_distinct_sort.a))
-> Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Settings: optimizer = 'off'
Optimizer: Postgres query optimizer
(16 rows)
After this commit:
select distinct count(a), sum(b) from t_distinct_sort order by sum(b), count(a);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate
Output: count(a), sum(b)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: (PARTIAL count(a)), (PARTIAL sum(b))
-> Partial Aggregate
Output: PARTIAL count(a), PARTIAL sum(b)
-> Seq Scan on public.t_distinct_sort
Output: a, b, c
Optimizer: Postgres query optimizer
(10 rows)
Authored-by: Zhang Mingli [email protected]
fix #ISSUE_Number
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
- [ ] Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
- [ ] Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
- [ ] Learn the coding contribution guide, including our code conventions, workflow and more.
- [ ] List your communication in the GitHub Issues or Discussions (if has or needed).
- [ ] Document changes.
- [ ] Add tests for the change
- [ ] Pass
make installcheck
- [ ] Pass
make -C src/test installcheck-cbdb-parallel
- [ ] Feel free to request
cloudberrydb/dev
team for review and approval when your PR is ready🥳