citus icon indicating copy to clipboard operation
citus copied to clipboard

Further support for JSON_ARRAYAGG() and JSON_OBJECTAGG()

Open naisila opened this issue 2 years ago • 0 comments

Setup:

CREATE TABLE agg_test(country text, city text);
SELECT create_distributed_table('agg_test', 'country', shard_count := 2);
INSERT INTO agg_test VALUES ('Albania', 'Tirana'), ('Albania', 'Shkodra'), ('Albania', 'Elbasan');
INSERT INTO agg_test VALUES ('Turkey', 'Ankara'), ('Turkey', 'Istanbul');
SET client_min_messages TO debug4;
SET citus.log_remote_commands TO on;
  1. Currently, for PG16's new SQL/JSON standard conforming constructors for JSON types, namely json_arrayagg() and json_objectagg(), we push down the execution to the workers when the aggregate is grouped by the table's distribution column:
SELECT JSON_ARRAYAGG(city) AS cities_aggregated FROM agg_test GROUP BY country;
DEBUG:  combine query: SELECT cities_aggregated FROM pg_catalog.citus_extradata_container(10, NULL::cstring(0), NULL::cstring(0), '(i 1)'::cstring(0)) remote_scan(cities_aggregated json, worker_column_2 text)
NOTICE:  issuing SELECT JSON_ARRAYAGG(city RETURNING json) AS cities_aggregated, country AS worker_column_2 FROM public.agg_test_102008 agg_test WHERE true GROUP BY country
DETAIL: on server naisila@localhost:9701 connectionId: 3
NOTICE:  issuing SELECT JSON_ARRAYAGG(city RETURNING json) AS cities_aggregated, country AS worker_column_2 FROM public.agg_test_102009 agg_test WHERE true GROUP BY country
DETAIL:  on server naisila@localhost:9702 connectionId: 4
  1. If not grouped on the distribution column, it is currently pulling all rows from the workers and performing the aggregation on the coordinator node:
SELECT JSON_ARRAYAGG(city) AS cities_aggregated FROM agg_test;
DEBUG:  combine query: SELECT JSON_ARRAYAGG(cities_aggregated RETURNING json) AS cities_aggregated FROM pg_catalog.citus_extradata_container(10, NULL::cstring(0), NULL::cstring(0), '(i 1)'::cstring(0)) remote_scan(cities_aggregated text)
NOTICE:  issuing SELECT city AS cities_aggregated FROM public.agg_test_102008 agg_test WHERE true
DETAIL:  on server naisila@localhost:9701 connectionId: 5
NOTICE:  issuing SELECT city AS cities_aggregated FROM public.agg_test_102009 agg_test WHERE true
DETAIL:  on server naisila@localhost:9702 connectionId: 6

We can improve the behavior in (2) by adding json_arrayagg() and json_objectagg() to our list of special-case aggregates. See also AggregateType struct in our codebase.

It could be a mimic of how the classic json_agg and json_object_agg work, utilizing the Citus-defined function json_cat_agg:

SELECT json_agg(city) AS cities_aggregated FROM agg_test;
DEBUG:  combine query: SELECT pg_catalog.json_cat_agg(cities_aggregated) AS cities_aggregated FROM pg_catalog.citus_extradata_container(10, NULL::cstring(0), NULL::cstring(0), '(i 1)'::cstring(0)) remote_scan(cities_aggregated json)
NOTICE:  issuing SELECT json_agg(city) AS cities_aggregated FROM public.agg_test_102008 agg_test WHERE true
DETAIL:  on server naisila@localhost:9701 connectionId: 5
NOTICE:  issuing SELECT json_agg(city) AS cities_aggregated FROM public.agg_test_102009 agg_test WHERE true
DETAIL:  on server naisila@localhost:9702 connectionId: 6

naisila avatar Oct 23 '23 12:10 naisila