citus icon indicating copy to clipboard operation
citus copied to clipboard

feat(columnar): implement dynamic chunk group allocations

Open imranzaheer612 opened this issue 3 months ago • 2 comments

Add support for dynamically allocating new chunk groups when the configurable size limit is reached. This prevents memory allocation failures and improves scalability for large columnar data sets.

  • Add new GUC parameter columnar.chunk_group_size_limit to control chunk group size threshold
  • Add regression tests covering chunk group expansion scenarios
  • Add chunk_group_size_limit column to columnar_internal.options updated in citus_columnar--12.2-1--13.2-1.sql

Fixes #6420, #7199

BEFORE:

postgres=# INSERT INTO test_oversized_row                                                                                                                                                                                                                                                 SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text                                                                                                                                                                                                                                         FROM generate_series(1, 600) AS gs;
2025-09-17 20:18:23.143 PKT [82542] ERROR:  out of memory
2025-09-17 20:18:23.143 PKT [82542] DETAIL:  Cannot enlarge string buffer containing 1071646716 bytes by 2097156 more bytes.
2025-09-17 20:18:23.143 PKT [82542] STATEMENT:  INSERT INTO test_oversized_row
	SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text
	FROM generate_series(1, 600) AS gs;
ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1071646716 bytes by 2097156 more bytes.

AFTER

postgres=# CREATE TABLE test_oversized_row (id INTEGER,huge_text TEXT) 
USING columnar WITH 
(columnar.chunk_group_row_limit = 1000,columnar.stripe_row_limit = 5000, columanar.chunk_group_size_limit = 256);
CREATE TABLE
postgres=# INSERT INTO test_oversized_row
SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text
FROM generate_series(1, 600) AS gs;
2025-09-17 17:32:03.004 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:04.822 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:06.592 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:08.419 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:10.238 PKT [34749] DEBUG:  Flushing Stripe of size 600
INSERT 0 600

imranzaheer612 avatar Sep 17 '25 15:09 imranzaheer612

@microsoft-github-policy-service agree

imranzaheer612 avatar Sep 17 '25 15:09 imranzaheer612

Looks like this is an other related issue: https://github.com/citusdata/citus/issues/7199

imranzaheer612 avatar Sep 25 '25 05:09 imranzaheer612