gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Free underlying Dynamic Seq Scan structures after processing of each partition

Open InnerLife0 opened this issue 3 years ago • 6 comments

The tables with high count of partitions can drain high amount of executor memory for processing. This can be especially critical for column-oriented tables, because we allocating a buffer (which is triple of block size) for each column. Following the standard flow we free such buffers at executor end. But it seems, we can safely free some structures, including this buffers, after processing of each partition. To do this in the uniform way, new *_afterscan() access methods introduced by this patch.

Related to #13725

InnerLife0 avatar Aug 25 '22 08:08 InnerLife0

Thanks a lot for your contribution. @InnerLife0

Have not looked into it (will do it soon, recently I am just too busy). I see we change some external function in 6X, will this break ABI? For example, what if some extension uses these functions.

kainwen avatar Aug 30 '22 14:08 kainwen

Thanks a lot for your contribution. @InnerLife0

Have not looked into it (will do it soon, recently I am just too busy). I see we change some external function in 6X, will this break ABI? For example, what if some extension uses these functions.

Can you point me to this function? As I can see, all existed functions has the same definition an logic.

InnerLife0 avatar Sep 01 '22 06:09 InnerLife0

Thanks a lot for your contribution. @InnerLife0 Have not looked into it (will do it soon, recently I am just too busy). I see we change some external function in 6X, will this break ABI? For example, what if some extension uses these functions.

Can you point me to this function? As I can see, all existed functions has the same definition an logic.

My bad. I misread.

Looking into this, thanks a lot for your contribution!

kainwen avatar Sep 07 '22 05:09 kainwen

Btw: does master also have this issue? and whether need to port it to master?

interma avatar Sep 08 '22 09:09 interma

Btw: does master also have this issue? and whether need to port it to master?

@interma Hi

I have manually tested on the latest master branch.

  1. master branch does not have the mem leak issue now (means I cannot reproduce the issue https://github.com/greenplum-db/gpdb/issues/12533 ), so master branch dose not have the code introduced in 6X (https://github.com/greenplum-db/gpdb/pull/12562)
  2. I have manually run the test case of this PR on master branch, see output below (it shows good result):
gpadmin=# drop table if exists aocs_part1;
NOTICE:  table "aocs_part1" does not exist, skipping
DROP TABLE
gpadmin=# drop table if exists aocs_part2;
NOTICE:  table "aocs_part2" does not exist, skipping
DROP TABLE
gpadmin=# create table aocs_part1 (i0 int, i1 int, i2 int, i3 int, i4 int, i5 int, i6 int, i7 int, i8 int, i9 int)
gpadmin-# with (appendonly=true, orientation=column) distributed by (i0)
gpadmin-# partition by range (i0) (start (0) end(1) every (1));
CREATE TABLE
gpadmin=# create table aocs_part2 (i0 int, i1 int, i2 int, i3 int, i4 int, i5 int, i6 int, i7 int, i8 int, i9 int)
gpadmin-# with (appendonly=true, orientation=column) distributed by (i0)
gpadmin-# partition by range (i0) (start (0) end(50) every (1));
CREATE TABLE
gpadmin=# create language plpythonu;
ERROR:  could not access file "$libdir/plpython2": No such file or directory
gpadmin=# create language plpython3u;
CREATE LANGUAGE
gpadmin=# \e
CREATE FUNCTION
gpadmin=# select
gpadmin-#   get_executor_mem('select i0, i1, i2, i3, i4, i5, i6, i7, i8, i9
gpadmin'#   from (
gpadmin'#     select *,
gpadmin'#     row_number() over (partition by i0 order by i1) as seq
gpadmin'#     from aocs_part1
gpadmin'#   ) as p
gpadmin'#   where seq=1');
 get_executor_mem
------------------
             1068
(1 row)

gpadmin=# select get_executor_mem('select i0, i1, i2, i3, i4, i5, i6, i7, i8, i9
gpadmin'#   from (
gpadmin'#     select *,
gpadmin'#     row_number() over (partition by i0 order by i1) as seq
gpadmin'#     from aocs_part2
gpadmin'#   ) as p
gpadmin'#   where seq=1');
 get_executor_mem
------------------
             1366
(1 row)

kainwen avatar Sep 08 '22 11:09 kainwen

Paste an optimization result (in my test env) for refer:

-- test_aocsp is an empty AOCS table (with 10 columns and 101 Partition)

demo=# explain analyze select * from test_aocsp;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=44) (actual time=61.481..61.481 rows=0 loops=1)
   ->  Sequence  (cost=0.00..431.00 rows=1 width=44) (never executed)
         ->  Partition Selector for test_aocsp (dynamic scan id: 1)  (cost=10.00..100.00 rows=34 width=4) (never executed)
               Partitions selected: 101 (out of 101)
         ->  Dynamic Seq Scan on test_aocsp (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=44) (never executed)
               Partitions scanned:  Avg 101.0 (out of 101) x 3 workers.  Max 101 parts (seg0).
 Planning time: 37.086 ms
   (slice0)    Executor memory: 69K bytes.
=> without this PR:  (slice1)    Executor memory: 116610K bytes avg x 3 workers, 116610K bytes max (seg0).
=> with this PR:     (slice1)    Executor memory: 4806K bytes avg x 3 workers, 4806K bytes max (seg0).
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 65.639 ms
(11 rows)

Memory usage decrease from 116MB to 4.8MB, amazing results!

interma avatar Sep 09 '22 09:09 interma

BTW: tried to apply a similar optimization to planner's nodeAppend: "lazy init subplan" + "call afterscan() after each subplan". The rough demo commit: https://github.com/interma/gpdb/commit/bfa078e3a0e550fb96d17bf93c3e3430c057c222 It can also reduce the memory usage (checked by ps command), but this commit will break explain results. Looks look also need to modify code in planner, so I stop here.

If a user complains about the scan on big partition table consume too many memories, they can try DynamicScan in ORCA.

interma avatar Sep 22 '22 03:09 interma

The PR addresses only for seqscan. Seems the problem is not unique to seqscan and can happen for index scans as well. So, shouldn't a similar be done for CleanupOnePartition(DynamicBitmapHeapScanState *scanState) also?

@InnerLife0 just a kindle reminder. Please help to take a look, I think it can also apply the xxx-afterscan() routines.

Thanks.

interma avatar Sep 22 '22 03:09 interma

Hi, guys. Sorry for late reply. It was difficult to force myself and to put my thoughts together.

The patch doesn't contain any fix for Dynamic Bitmap Heap Scan as I was unable to find a proper example which drains memory as Dynamic Seq Scan.

InnerLife0 avatar Oct 03 '22 09:10 InnerLife0

Hi, guys. Sorry for late reply. It was difficult to force myself and to put my thoughts together.

The patch doesn't contain any fix for Dynamic Bitmap Heap Scan as I was unable to find a proper example which drains memory as Dynamic Seq Scan.

Thanks @InnerLife0, so I think we can let this PR to be merged first.

interma avatar Oct 10 '22 08:10 interma

Merged, @InnerLife0 thanks for your contribution!

interma avatar Oct 18 '22 07:10 interma