oracle_fdw
oracle_fdw copied to clipboard
Support Aggregation pushdown
Hi Laurenz,
Sorry for creating an issue ticket during vacation.
I would like to share the source code to support Aggregation pushdown, which is below:
- Support pushdown GROUP BY (var/expression)
- Support pushdown GROUP BY + HAVING
- Support pushdown ORDER BY (var/expression)
- Support pushdown LIMIT + OFFSET
- Support pushdown the basic aggregate functions (sum, count, min, max, avg, stddev, variance)
- Aggregates in subquery are pushed down
- Grouping Sets (not pushdown)
The whole source code has ~3300 lines of code (add) and ~560 lines of code (delete). Are you interested?
I am confused. oracle_fdw already supports pushing down LIMIT
and OFFSET
and ORDER BY
.
But I am potentially interested in aggregate pushdown.
However, i am not interested in a single commit that changes 4000 lines. That would be nigh impossible to review. Please keep a pull request to a single functionality and don't modify unrelated things. It can be a good idea to split it into multiple commits for readability.
I am confused. oracle_fdw already supports pushing down LIMIT and OFFSET and ORDER BY.
I meant we pushdown Aggregation with the above clauses. As my investigation on postgres_fdw's code, we need to follow other code flow. You can refer to postgresGetForeignUpperPaths function.
For example, we execute the query SELECT d, count(*) FROM typetest1 GROUP BY d ORDER BY 1;
on postgresql
Pushed down query to oracle should be:
Oracle query: SELECT /*84c0105d8f73afbea6de52ca10fae7ab*/ "D", count(*) FROM "TYPETEST1" GROUP BY "D" ORDER BY "D" ASC NULLS LAST
But I am potentially interested in aggregate pushdown.
Thank you for your interest.
However, i am not interested in a single commit that changes 4000 lines. That would be nigh impossible to review. Please keep a pull request to a single functionality and don't modify unrelated things. It can be a good idea to split it into multiple commits for readability.
Yes, I agree with you. I will try to separate to single commit for each functionality. I'd like to port test cases of postgres_fdw to oracle_fdw, so we don't need to make new test for Aggregation pushdown. Could you please confirm this point?. I will support the tests below. Please let me know if you have any comment.
1, Support aggregation pushdown with the basic aggregate functions. Currently, I can support the following functions:
sum, avg, max, min, stddev, count, variance, corr, covar_pop, covar_samp, cume_dist, dense_rank, percent_rank, stddev_pop, stddev_samp, var_pop, var_samp, percentile_cont, percentile_disc,
2, Support aggregation pushdown with grouping (section Aggregate and grouping queries in postgresql_fdw.sql) 2.1, Simple aggregates
explain (costs off)
select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2), sum(c1) * (random() <= 1)::int as sum2 from ft1 where c2 < 5 group by c2 order by 1, 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Oracle query: SELECT /*99914bc7fbec710b637d4aa744a8f83e*/ count("C6"), sum("C 1"), avg("C 1"), min("C2"), max("C 1"), stddev("C2"), "C2" FROM "T 1" WHERE ("C2" < 5) GROUP BY "C2" ORDER BY count("C6") ASC NULLS LAST, sum("C 1") ASC NULLS LAST
(2 rows)
2.2. Aggregates in subquery are pushed down.
explain (costs off)
select count(x.a), sum(x.a) from (select c2 a, sum(c1) b from ft1 group by c2, sqrt(c1) order by 1, 2) x;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate
-> Foreign Scan
Oracle query: SELECT /*dcc391c9580f4de1e9bb9fc28229222d*/ "C2", sum("C 1"), sqrt("C 1") FROM "T 1" GROUP BY "C2", (sqrt("C 1")) ORDER BY "C2" ASC NULLS LAST, sum("C 1") ASC NULLS LAST
(3 rows)
2.3. GROUP BY clause in various forms, cardinal, alias and constant expression
explain (costs off)
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Oracle query: SELECT /*2eae2d902b6851d128a93d5818be1851*/ count("C2"), "C2", 5, 7.0, 9 FROM "T 1" GROUP BY "C2", 5, 9 ORDER BY "C2" ASC NULLS LAST
(2 rows)
2.4. GROUP BY clause referring to same column multiple times
explain (costs off)
select c2, c2 from ft1 where c2 > 6 group by 1, 2 order by sum(c1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Oracle query: SELECT /*39d3eb479f963b88b63e0e00bdb29d3a*/ "C2", "C2", sum("C 1") FROM "T 1" WHERE ("C2" > 6) GROUP BY "C2", "C2" ORDER BY sum("C 1") ASC NULLS LAST
(2 rows)
2.5. Testing HAVING clause shippability
explain (costs off)
select c2, sum(c1) from ft2 group by c2 having avg(c1) < 500 and sum(c1) < 49800 order by c2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Oracle query: SELECT /*007f188f5832febc5bc87ccaeae8785f*/ "C2", sum("C 1") FROM "T 1" GROUP BY "C2" HAVING (avg("C 1") < 500) AND (sum("C 1") < 49800) ORDER BY "C2" ASC NULLS LAST
(2 rows)
2.6. FULL join with IS NULL check in HAVING
explain (costs off)
select avg(t1.c1), sum(t2.c1) from ft4 t1 full join ft5 t2 on (t1.c1 = t2.c1) group by t2.c1 having (avg(t1.c1) is null and sum(t2.c1) < 10) or sum(t2.c1) is null order by 1 nulls last, 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Oracle query: SELECT /*adfb9afbf9c47b7f2fc662c311ae69a4*/ avg(r1."C1"), sum(r2."C1"), r2."C1" FROM ( "T 3" r1 FULL JOIN "T 4" r2 ON (r1."C1" = r2."C1")) GROUP BY r2."C1" HAVING (((avg(r1."C1") IS NULL) AND (sum(r2."C1") < 10)) OR (sum(r2."C1") IS NULL)) ORDER BY avg(r1."C1") ASC NULLS LAST, sum(r2."C1") ASC NULLS LAST
(2 rows)
I understand. Getting inspiration from postgres_fdw's test cases is a good idea.
As my investigation on postgres_fdw's code, we need to follow other code flow.
If you mean that part of the existing code has to be refactored, that refactoring would for example be a good candidate for a separate patch.
is there any plan to support this feature. or any one can submit a patch?
@hslightdb I have no plans to implement that, but patches are welcome, as long as they are written in a way that I can review.
@hslightdb @laurenz It's under development now, please wait for my patch.
@laurenz As we discussed, I need to separate to small patches so you can review it. I'd like to share patches in the following order. You can review them one by one. After you finished reviewing one patch, I will send you the next patch.
Patch 1: Refactor code following PostgreSQL FDW code style.
- Refactor ORDER BY pushdown (following PostgreSQL FDW code style)
- Refactor some deparse functions (following PostgreSQL FDW code style)
- Refactor building oratable for scanning query and fix some small issues regarding new oratable.
- Other parts are not changed. The advantages of this refactoring is to help us to add more feature easily (e.g Aggregation pushdown). It is easy for maintainability in the future.
Patch 2: Support Aggregate function pushdown
- Some more aggregate functions like AVG, SUM, MIN, MAX will be pushed down. More functions can be added easily if we want (for improvement/enhancement)
- Add more test cases
Patch 3: Support GROUP BY pushdown
- Add code and more test cases
Patch 4: Support GROUP BY HAVING pushdown
- Add code and more test cases
Patch 5: Support Aggregation and ORDER BY pushdown
- Add code and more test cases
For LIMIT+OFFSET, it is not pushed down with Aggregation as mentioned in the code. Do you agree with my proposal above?
That sounds OK. If the patches are completely independent, you can propose them one at a time. If some change makes sense in the light of a later patch, it is helpful to see them all at once.
@laurenz Thanks for your feedback. I'd like to share all patches in the attachment. aggregation_pushdown.zip
If you have any queries, please let me know.
I created patches based on the commit 7c65f589edeae1923e804b4315ddc1fa8894b1cf I developed on the following environment.
- Centos 7
- OracleXE 21c
- PostgreSQL 15.0
I'd like to explain more about the patches.
- Patch1
- Refactor ORDER BY pushdown
- Refactor deparse functions
- Introduce a new way to create oratable for scanning table
Fix bug that oracle api does not handle binary type in expression
Fix bug that oracle api does not handle comparison with CLOB
Fix bug that oracle api does not handle NULL column without casting to a data type
- Patch2
- Support some basic aggregate functions AVG, SUM, COUNT, MIN, MAX,...
- Add more test cases
- Patch3
- GROUP BY clause pushdown
- Add more test cases
- Patch4
- HAVING clause pushdown
- Add more test cases
- Patch5
- ORDER BY pushdown with Aggregation
- Add more test cases
Thanks. I'll have a look as soon as I get time. Since you say you wrote the patch for PostgreSQL v15: you are aware that the patch has to support at least the supported versions, that is v11 to v16, right?
Sorry, I have not built and tested on other versions yet. I worked on PostgreSQL 15.0 because it is the latest stable version.
I have had a brief look at patch 1. It does not apply to current master. Can yo fix that?
I must say that this is so much code that I find it hard to read. You have several things lumped together; according to your description:
-
refactor ORDER BY pushdown
that confuses me, because a later patch introduces ORDER BY pushdown.
-
refactor deparse functions
-
fix three bugs
Perhaps it would be better to have separate patches for that, in particular for the bug fixes (which bugs?)
Then I have a change to understand what is going on.
It would also be great if you didn't modify unrelated things like the Makefile
.
@laurenz
It does not apply to current master. Can yo fix that?
Yes, I'll do that.
that confuses me, because a later patch introduces ORDER BY pushdown.
The later patch introduces ORDER BY pushdown with Aggregation. According to postgres_fdw's code, it extends a simple foreign path for the underlying grouping relation to perform the final sort remotely.
Perhaps it would be better to have separate patches for that, in particular for the bug fixes (which bugs?)
Can I separate Patch1 to 5 smaller patches?
- Patch1.1: Refactor ORDER BY pushdown (following postgres_fdw's code style for maintainability)
- Patch1.2: Refactor Deparse functions + New oratable for scanning query including simple, join (and aggregation query later patches)
- Patch1.3: Fix bug 1
- Patch1.4: Fix bug 2
- Patch1.5: Fix bug 3
It would also be great if you didn't modify unrelated things like the Makefile.
I will revert it.
@laurenz PostgreSQL 16beta1 was released. Do you want these patches working on 16beta1?
There is already beta 2. Yes, the patches should work on all versions between v11 and v17 (current HEAD). I know that that is annoying.
@laurenz Thanks for your information. I will use PostgreSQL 16beta2
Could you please confirm the separated patches at https://github.com/laurenz/oracle_fdw/issues/569#issuecomment-1628562120? If you do not agree, how small do you need patches?
Sorry for the late reply. Yes, that is fine. Thanks.
I cannot promise fast work on this. While I can work on oracle_fdw during my working hours, it is non-profit community work, and other things have to take precedence.
I'll do my best to understand and absorb your code, but I won't make any promises. I won't merge anything that I don't thoroughly understand. After all, I'll be the one who has to deal with bugs arising from the code. If I can't cope with it, you may have to fork oracle_fdw.
@laurenz , @jopoly any updates on this? I'm interested on basic aggregates pushdown: count, sum, min, max, etc..
today was trying to compare row counts between tables and figured out that basic select count(*)
looks to transfer 1
for every source row and summing them up on the postgres side.
which makes such basic but important queries almost unusable on even mid size tables.
I'll close the issue as "inactive".