gpdb High memory consumption of processing highly-partitioned column-oriented tables.

High memory consumption of processing highly-partitioned column-oriented tables.

Open InnerLife0 opened this issue 3 years ago • 0 comments

Hi, guys. Processing of each column needs a buffer which is a triple of block size. With another allocations (in case we use default block size) we need at least 98360 bytes to process one column. Additionally, we need at least 16384 bytes to process each partition. Then, for example, to process DynamicSeqScan node over a table with 1000 partitions and 30 columns we need to allocate and keep at once

1000 * 16384 + 1000 * 30 * 98360 = 2967184000 bytes or 2967 mb

of memory.

Simply destroying of datumstreams after processing of each partition is not a solution, because we have ReScan's. And this line points us to this commit, which explains new behavior.

My question is: Was high memory consumption taken into account? Mentioned patch has a comment about init-clean for each SeqScan as an alternative solution. If we can't do so, how can we decrease memory consumption and should we?

Jun 24 '22 08:06 InnerLife0

gpdb gpdb copied to clipboard

High memory consumption of processing highly-partitioned column-oriented tables.

gpdb
gpdb copied to clipboard