oracle_fdw icon indicating copy to clipboard operation
oracle_fdw copied to clipboard

Support Aggregation pushdown

Open jopoly opened this issue 2 years ago • 19 comments

Hi Laurenz,

Sorry for creating an issue ticket during vacation.

I would like to share the source code to support Aggregation pushdown, which is below:

  • Support pushdown GROUP BY (var/expression)
  • Support pushdown GROUP BY + HAVING
  • Support pushdown ORDER BY (var/expression)
  • Support pushdown LIMIT + OFFSET
  • Support pushdown the basic aggregate functions (sum, count, min, max, avg, stddev, variance)
  • Aggregates in subquery are pushed down
  • Grouping Sets (not pushdown)

The whole source code has ~3300 lines of code (add) and ~560 lines of code (delete). Are you interested?

jopoly avatar Dec 27 '22 02:12 jopoly

I am confused. oracle_fdw already supports pushing down LIMIT and OFFSET and ORDER BY.

But I am potentially interested in aggregate pushdown.

However, i am not interested in a single commit that changes 4000 lines. That would be nigh impossible to review. Please keep a pull request to a single functionality and don't modify unrelated things. It can be a good idea to split it into multiple commits for readability.

laurenz avatar Jan 09 '23 21:01 laurenz

I am confused. oracle_fdw already supports pushing down LIMIT and OFFSET and ORDER BY.

I meant we pushdown Aggregation with the above clauses. As my investigation on postgres_fdw's code, we need to follow other code flow. You can refer to postgresGetForeignUpperPaths function. For example, we execute the query SELECT d, count(*) FROM typetest1 GROUP BY d ORDER BY 1; on postgresql Pushed down query to oracle should be:

  Oracle query: SELECT /*84c0105d8f73afbea6de52ca10fae7ab*/ "D", count(*) FROM  "TYPETEST1" GROUP BY "D" ORDER BY "D" ASC NULLS LAST

But I am potentially interested in aggregate pushdown.

Thank you for your interest.

However, i am not interested in a single commit that changes 4000 lines. That would be nigh impossible to review. Please keep a pull request to a single functionality and don't modify unrelated things. It can be a good idea to split it into multiple commits for readability.

Yes, I agree with you. I will try to separate to single commit for each functionality. I'd like to port test cases of postgres_fdw to oracle_fdw, so we don't need to make new test for Aggregation pushdown. Could you please confirm this point?. I will support the tests below. Please let me know if you have any comment.

1, Support aggregation pushdown with the basic aggregate functions. Currently, I can support the following functions: sum, avg, max, min, stddev, count, variance, corr, covar_pop, covar_samp, cume_dist, dense_rank, percent_rank, stddev_pop, stddev_samp, var_pop, var_samp, percentile_cont, percentile_disc,

2, Support aggregation pushdown with grouping (section Aggregate and grouping queries in postgresql_fdw.sql) 2.1, Simple aggregates

explain (costs off)
select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2), sum(c1) * (random() <= 1)::int as sum2 from ft1 where c2 < 5 group by c2 order by 1, 2;
                                                                                                                      QUERY PLAN                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*99914bc7fbec710b637d4aa744a8f83e*/ count("C6"), sum("C 1"), avg("C 1"), min("C2"), max("C 1"), stddev("C2"), "C2" FROM  "T 1" WHERE ("C2" < 5) GROUP BY "C2" ORDER BY count("C6") ASC NULLS LAST, sum("C 1") ASC NULLS LAST
(2 rows)

2.2. Aggregates in subquery are pushed down.

explain (costs off)
select count(x.a), sum(x.a) from (select c2 a, sum(c1) b from ft1 group by c2, sqrt(c1) order by 1, 2) x;
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate
   ->  Foreign Scan
         Oracle query: SELECT /*dcc391c9580f4de1e9bb9fc28229222d*/ "C2", sum("C 1"), sqrt("C 1") FROM  "T 1" GROUP BY "C2", (sqrt("C 1")) ORDER BY "C2" ASC NULLS LAST, sum("C 1") ASC NULLS LAST
(3 rows)

2.3. GROUP BY clause in various forms, cardinal, alias and constant expression

explain (costs off)
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*2eae2d902b6851d128a93d5818be1851*/ count("C2"), "C2", 5, 7.0, 9 FROM  "T 1" GROUP BY "C2", 5, 9 ORDER BY "C2" ASC NULLS LAST
(2 rows)

2.4. GROUP BY clause referring to same column multiple times

explain (costs off)
select c2, c2 from ft1 where c2 > 6 group by 1, 2 order by sum(c1);
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*39d3eb479f963b88b63e0e00bdb29d3a*/ "C2", "C2", sum("C 1") FROM  "T 1" WHERE ("C2" > 6) GROUP BY "C2", "C2" ORDER BY sum("C 1") ASC NULLS LAST
(2 rows)

2.5. Testing HAVING clause shippability

explain (costs off)
select c2, sum(c1) from ft2 group by c2 having avg(c1) < 500 and sum(c1) < 49800 order by c2;
                                                                                       QUERY PLAN                                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*007f188f5832febc5bc87ccaeae8785f*/ "C2", sum("C 1") FROM  "T 1" GROUP BY "C2" HAVING (avg("C 1") < 500) AND (sum("C 1") < 49800) ORDER BY "C2" ASC NULLS LAST
(2 rows)

2.6. FULL join with IS NULL check in HAVING

explain (costs off)
select avg(t1.c1), sum(t2.c1) from ft4 t1 full join ft5 t2 on (t1.c1 = t2.c1) group by t2.c1 having (avg(t1.c1) is null and sum(t2.c1) < 10) or sum(t2.c1) is null order by 1 nulls last, 2;
                                                                                                                                                             QUERY PLAN                                                                                                                                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*adfb9afbf9c47b7f2fc662c311ae69a4*/ avg(r1."C1"), sum(r2."C1"), r2."C1" FROM ( "T 3" r1 FULL JOIN  "T 4" r2 ON (r1."C1" = r2."C1")) GROUP BY r2."C1" HAVING (((avg(r1."C1") IS NULL) AND (sum(r2."C1") < 10)) OR (sum(r2."C1") IS NULL)) ORDER BY avg(r1."C1") ASC NULLS LAST, sum(r2."C1") ASC NULLS LAST
(2 rows)

jopoly avatar Jan 12 '23 10:01 jopoly

I understand. Getting inspiration from postgres_fdw's test cases is a good idea.

As my investigation on postgres_fdw's code, we need to follow other code flow.

If you mean that part of the existing code has to be refactored, that refactoring would for example be a good candidate for a separate patch.

laurenz avatar Jan 12 '23 11:01 laurenz

is there any plan to support this feature. or any one can submit a patch?

hslightdb avatar Mar 22 '23 02:03 hslightdb

@hslightdb I have no plans to implement that, but patches are welcome, as long as they are written in a way that I can review.

laurenz avatar Mar 22 '23 09:03 laurenz

@hslightdb @laurenz It's under development now, please wait for my patch.

jopoly avatar Apr 20 '23 04:04 jopoly

@laurenz As we discussed, I need to separate to small patches so you can review it. I'd like to share patches in the following order. You can review them one by one. After you finished reviewing one patch, I will send you the next patch.

Patch 1: Refactor code following PostgreSQL FDW code style.

  • Refactor ORDER BY pushdown (following PostgreSQL FDW code style)
  • Refactor some deparse functions (following PostgreSQL FDW code style)
  • Refactor building oratable for scanning query and fix some small issues regarding new oratable.
  • Other parts are not changed. The advantages of this refactoring is to help us to add more feature easily (e.g Aggregation pushdown). It is easy for maintainability in the future.

Patch 2: Support Aggregate function pushdown

  • Some more aggregate functions like AVG, SUM, MIN, MAX will be pushed down. More functions can be added easily if we want (for improvement/enhancement)
  • Add more test cases

Patch 3: Support GROUP BY pushdown

  • Add code and more test cases

Patch 4: Support GROUP BY HAVING pushdown

  • Add code and more test cases

Patch 5: Support Aggregation and ORDER BY pushdown

  • Add code and more test cases

For LIMIT+OFFSET, it is not pushed down with Aggregation as mentioned in the code. Do you agree with my proposal above?

jopoly avatar May 18 '23 03:05 jopoly

That sounds OK. If the patches are completely independent, you can propose them one at a time. If some change makes sense in the light of a later patch, it is helpful to see them all at once.

laurenz avatar May 22 '23 09:05 laurenz

@laurenz Thanks for your feedback. I'd like to share all patches in the attachment. aggregation_pushdown.zip

If you have any queries, please let me know.

jopoly avatar Jun 08 '23 09:06 jopoly

I created patches based on the commit 7c65f589edeae1923e804b4315ddc1fa8894b1cf I developed on the following environment.

  • Centos 7
  • OracleXE 21c
  • PostgreSQL 15.0

jopoly avatar Jun 08 '23 09:06 jopoly

I'd like to explain more about the patches.

  1. Patch1
  • Refactor ORDER BY pushdown
  • Refactor deparse functions
  • Introduce a new way to create oratable for scanning table
    Fix bug that oracle api does not handle binary type in expression
    Fix bug that oracle api does not handle comparison with CLOB
    Fix bug that oracle api does not handle NULL column without casting to a data type
  1. Patch2
  • Support some basic aggregate functions AVG, SUM, COUNT, MIN, MAX,...
  • Add more test cases
  1. Patch3
  • GROUP BY clause pushdown
  • Add more test cases
  1. Patch4
  • HAVING clause pushdown
  • Add more test cases
  1. Patch5
  • ORDER BY pushdown with Aggregation
  • Add more test cases

jopoly avatar Jun 08 '23 09:06 jopoly

Thanks. I'll have a look as soon as I get time. Since you say you wrote the patch for PostgreSQL v15: you are aware that the patch has to support at least the supported versions, that is v11 to v16, right?

laurenz avatar Jun 12 '23 12:06 laurenz

Sorry, I have not built and tested on other versions yet. I worked on PostgreSQL 15.0 because it is the latest stable version.

jopoly avatar Jun 13 '23 02:06 jopoly

I have had a brief look at patch 1. It does not apply to current master. Can yo fix that?

I must say that this is so much code that I find it hard to read. You have several things lumped together; according to your description:

  • refactor ORDER BY pushdown

    that confuses me, because a later patch introduces ORDER BY pushdown.

  • refactor deparse functions

  • fix three bugs

Perhaps it would be better to have separate patches for that, in particular for the bug fixes (which bugs?)

Then I have a change to understand what is going on.

It would also be great if you didn't modify unrelated things like the Makefile.

laurenz avatar Jul 05 '23 11:07 laurenz

@laurenz

It does not apply to current master. Can yo fix that?

Yes, I'll do that.

that confuses me, because a later patch introduces ORDER BY pushdown.

The later patch introduces ORDER BY pushdown with Aggregation. According to postgres_fdw's code, it extends a simple foreign path for the underlying grouping relation to perform the final sort remotely.

Perhaps it would be better to have separate patches for that, in particular for the bug fixes (which bugs?)

Can I separate Patch1 to 5 smaller patches?

  • Patch1.1: Refactor ORDER BY pushdown (following postgres_fdw's code style for maintainability)
  • Patch1.2: Refactor Deparse functions + New oratable for scanning query including simple, join (and aggregation query later patches)
  • Patch1.3: Fix bug 1
  • Patch1.4: Fix bug 2
  • Patch1.5: Fix bug 3

It would also be great if you didn't modify unrelated things like the Makefile.

I will revert it.

jopoly avatar Jul 10 '23 09:07 jopoly

@laurenz PostgreSQL 16beta1 was released. Do you want these patches working on 16beta1?

jopoly avatar Jul 10 '23 10:07 jopoly

There is already beta 2. Yes, the patches should work on all versions between v11 and v17 (current HEAD). I know that that is annoying.

laurenz avatar Jul 10 '23 13:07 laurenz

@laurenz Thanks for your information. I will use PostgreSQL 16beta2

Could you please confirm the separated patches at https://github.com/laurenz/oracle_fdw/issues/569#issuecomment-1628562120? If you do not agree, how small do you need patches?

jopoly avatar Jul 13 '23 02:07 jopoly

Sorry for the late reply. Yes, that is fine. Thanks.

I cannot promise fast work on this. While I can work on oracle_fdw during my working hours, it is non-profit community work, and other things have to take precedence.

I'll do my best to understand and absorb your code, but I won't make any promises. I won't merge anything that I don't thoroughly understand. After all, I'll be the one who has to deal with bugs arising from the code. If I can't cope with it, you may have to fork oracle_fdw.

laurenz avatar Jul 13 '23 15:07 laurenz

@laurenz , @jopoly any updates on this? I'm interested on basic aggregates pushdown: count, sum, min, max, etc..

today was trying to compare row counts between tables and figured out that basic select count(*) looks to transfer 1 for every source row and summing them up on the postgres side.

which makes such basic but important queries almost unusable on even mid size tables.

AmebaBrain avatar May 15 '24 21:05 AmebaBrain

I'll close the issue as "inactive".

laurenz avatar May 16 '24 08:05 laurenz