timescaledb
timescaledb copied to clipboard
Running refresh_continuous_aggregate inside a for loop
Relevant system information:
- OS: Arch Linux
- PostgreSQL version: 13.2
- TimescaleDB version: 2.2.0
Describe the bug
Because max_interval_per_job
is no longer supported the materialization jobs might run over very large data ranges (and concurrently) using too much memory. Especially on freshly imported backups.
I tried to wite a procedure to call refresh_continuous_aggregates
repeatedly over smaller ranges but this throws
the following error:
FEHLER: cannot commit while a portal is pinned
KONTEXT: SQL-Anweisung »CALL refresh_continuous_aggregate(agg, "interval"."begin", "interval"."end")«
PL/pgSQL-Funktion dev.update_aggregate(regclass) Zeile 20 bei CALL
To Reproduce
CREATE OR REPLACE PROCEDURE dev.update_aggregate(
agg regclass)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
"interval" record;
BEGIN
FOR "interval" IN
SELECT
s AS "begin",
s + '30 days'::interval AS "end"
FROM timescaledb_information.chunks c,
LATERAL generate_series(time_bucket('30 days'::interval, c."range_start") AT TIME ZONE 'UTC', time_bucket('30 days'::interval, c."range_end") AT TIME ZONE 'UTC', '30 days'::interval) s
WHERE
c."hypertable_schema" = 'dev' AND c."hypertable_name" = 'data'
AND s < c."range_end"
GROUP BY s
ORDER BY s DESC
LOOP
CALL refresh_continuous_aggregate(agg, "interval"."begin", "interval"."end");
END LOOP;
END;
$BODY$;
The query in the for loop seems to hold a cursor open (Mention of portal in the PostgreSQL documentation) which prevents the materialization transactions to commit. Can this restriction be lifted? This would allow to write custom materialization functions that loop over some query results.
This could be related to #2876.
@NomAnor Thank you for the feature request. If the problem is that the refresh_continuous_aggregate
consumes too much memory, then a better approach would be to release memory on a regular basis inside the refresh_continuous_aggregate
. Would that solve your problem or is there another reason you want to be able to run the function inside a procedure or transaction?
I think that would work. Would this split the work into multiple transactions internally? Maybe have an optional batch_size
parameter that specifies the maximum number of chunks per transaction? That should also be available with add_continuous_aggregate_policy
.
I didn't try to run this inside a transaction (as far as I understand it, a procedure does not start one automatically and I CALL
ed it as the only command from psql). It should be possible to use is it inside a procedure. It's just not possible to use it inside a foor loop that loops over a query result because of the open cursor (that might internally start a transaction?). Maybe add an additional error note in the documentation like
- The function is called inside a for loop over query results.
I'm going to jump on this one. My use case is simply to iterate backwards backfilling caggs, and this is what I tried:
do $$
declare rec record;
begin
for rec in
select generate_series('2021-06-01', '2015-01-01', '-12 hour'::interval) as foo;
loop
raise notice '%', rec.foo;
CALL refresh_continuous_aggregate('blah', rec.foo, rec.foo+'12 hour'::interval);
end loop;
end $$;
which also results in ERROR: cannot commit while a portal is pinned
In this context, the memory usage is too high to backfill more than a day, and the whole operation is expected to take considerable time (days for this table, weeks for the next table I'm tackling) so I want to work backwards and get the more useful (recent) data to my users quicker.
My workaround is to abuse shell, screen and a passwordless login to do it:
for x in `seq 1 16000`;
do psql dbname -qt -c "select current_date-($x * '3 hours'::interval)" \
-c "CALL refresh_continuous_aggregate('blah', current_date-($x * '3 hours'::interval), (current_date-($x * '3 hours'::interval))+'3 hours'::interval)";
done
Not the prettiest, but it works, and unlike a single refresh_continuous_aggregates()
call, I can be pretty sure I'm not going to end up losing weeks worth of crunching if something goes wrong towards the end
@mkindahl we are running into ERROR: cannot commit while a portal is pinned
when running refresh_continuous_aggregate from a procedure too.
I've run into this using CREATE INDEX ... WITH (timescaledb.transaction_per_chunk);
. It doesn't seem to be a memory issue (it happened on a basically empty table).