citus icon indicating copy to clipboard operation
citus copied to clipboard

Potential Memory Leak with `jsonb_build_object()` on Columnar Tables

Open vldt-gpt opened this issue 9 months ago • 0 comments

PosgreSQL's jsonb_build_object() over columnar tables consumes excessive memory and leads to OOM crash.

Hi team, I'm running into what appears to be a memory leak when executing a GROUP BY query using jsonb_build_object() over a few columnar tables on:

  • PostgreSQL: 16
  • Citus: 13
  • Hosted on a single EC2 instance with ~250 GB RAM.

Symptoms:

The query runs for ~3 minutes, consuming:

  • 155GB RES
  • 290GB VIRT
  • 63%+ of total memory

Once memory usage crosses 206GB+, the server eventually crashes with an OOM error if not manually killed. The query itself is a fairly typical OLAP-style aggregation.

Image

Query Summary:

The pattern (simplified) looks like:

WITH filtered_data AS (
  SELECT ...
  FROM columnar_table_1 AS a
  JOIN columnar_table_2 AS b ON ...
  LEFT JOIN columnar_table_3 AS c ON ...
  WHERE c.some_json_field ->> 'some_key' = 'some_value'
)
SELECT
  jsonb_build_object(...),
  some_keys,
  SUM(...),
  VARIANCE(...)
FROM filtered_data
GROUP BY ...;
  • All input tables are columnar.
  • Two of the inputs are joined on multiple fields, and the main aggregation is done after a jsonb_build_object() call in the SELECT.

Observations:

  • The problem only occurs when jsonb_build_object() is used.
  • No UDFs or temporary tables are involved.
  • Setting max_parallel_workers_per_gather = 0 and disabling columnar.enable_custom_scan doesn't resolve the issue.

Question:

  1. Is this a known limitation or a bug in how Citus handles jsonb_build_object() over columnar data with parallel execution or row-materialization?
  2. Any recommendations on safely constructing JSON from large columnar datasets?

vldt-gpt avatar Mar 25 '25 16:03 vldt-gpt