timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

Running refresh_continuous_aggregate inside a for loop

Open NomAnor opened this issue 3 years ago • 5 comments

Relevant system information:

  • OS: Arch Linux
  • PostgreSQL version: 13.2
  • TimescaleDB version: 2.2.0

Describe the bug Because max_interval_per_job is no longer supported the materialization jobs might run over very large data ranges (and concurrently) using too much memory. Especially on freshly imported backups. I tried to wite a procedure to call refresh_continuous_aggregates repeatedly over smaller ranges but this throws the following error:

FEHLER:  cannot commit while a portal is pinned
KONTEXT:  SQL-Anweisung »CALL refresh_continuous_aggregate(agg, "interval"."begin", "interval"."end")«
PL/pgSQL-Funktion dev.update_aggregate(regclass) Zeile 20 bei CALL

To Reproduce

CREATE OR REPLACE PROCEDURE dev.update_aggregate(
	agg regclass)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
	"interval" record;
BEGIN
	FOR "interval" IN
		SELECT
			s AS "begin",
			s + '30 days'::interval AS "end"
		FROM timescaledb_information.chunks c,
		LATERAL generate_series(time_bucket('30 days'::interval, c."range_start") AT TIME ZONE 'UTC', time_bucket('30 days'::interval, c."range_end") AT TIME ZONE 'UTC', '30 days'::interval) s
		WHERE
			c."hypertable_schema" = 'dev' AND c."hypertable_name" = 'data'
			AND s < c."range_end"
		GROUP BY s
		ORDER BY s DESC
	LOOP
		CALL refresh_continuous_aggregate(agg, "interval"."begin", "interval"."end");
	END LOOP;
END;
$BODY$;

The query in the for loop seems to hold a cursor open (Mention of portal in the PostgreSQL documentation) which prevents the materialization transactions to commit. Can this restriction be lifted? This would allow to write custom materialization functions that loop over some query results.

This could be related to #2876.

NomAnor avatar Apr 30 '21 11:04 NomAnor

@NomAnor Thank you for the feature request. If the problem is that the refresh_continuous_aggregate consumes too much memory, then a better approach would be to release memory on a regular basis inside the refresh_continuous_aggregate. Would that solve your problem or is there another reason you want to be able to run the function inside a procedure or transaction?

mkindahl avatar May 03 '21 06:05 mkindahl

I think that would work. Would this split the work into multiple transactions internally? Maybe have an optional batch_size parameter that specifies the maximum number of chunks per transaction? That should also be available with add_continuous_aggregate_policy.

I didn't try to run this inside a transaction (as far as I understand it, a procedure does not start one automatically and I CALLed it as the only command from psql). It should be possible to use is it inside a procedure. It's just not possible to use it inside a foor loop that loops over a query result because of the open cursor (that might internally start a transaction?). Maybe add an additional error note in the documentation like

  • The function is called inside a for loop over query results.

NomAnor avatar May 03 '21 11:05 NomAnor

I'm going to jump on this one. My use case is simply to iterate backwards backfilling caggs, and this is what I tried:

do $$
declare rec record;
begin
for rec in
    select generate_series('2021-06-01', '2015-01-01', '-12 hour'::interval) as foo;
loop
    raise notice '%', rec.foo;
    CALL refresh_continuous_aggregate('blah', rec.foo, rec.foo+'12 hour'::interval);
end loop;
end $$;

which also results in ERROR: cannot commit while a portal is pinned

In this context, the memory usage is too high to backfill more than a day, and the whole operation is expected to take considerable time (days for this table, weeks for the next table I'm tackling) so I want to work backwards and get the more useful (recent) data to my users quicker.

My workaround is to abuse shell, screen and a passwordless login to do it:

for x in `seq 1 16000`;
  do psql dbname -qt -c "select current_date-($x * '3 hours'::interval)" \
  -c "CALL refresh_continuous_aggregate('blah', current_date-($x * '3 hours'::interval), (current_date-($x * '3 hours'::interval))+'3 hours'::interval)";
done

Not the prettiest, but it works, and unlike a single refresh_continuous_aggregates() call, I can be pretty sure I'm not going to end up losing weeks worth of crunching if something goes wrong towards the end

mjb512 avatar Jun 07 '21 21:06 mjb512

@mkindahl we are running into ERROR: cannot commit while a portal is pinned when running refresh_continuous_aggregate from a procedure too.

cevian avatar Aug 03 '21 14:08 cevian

I've run into this using CREATE INDEX ... WITH (timescaledb.transaction_per_chunk);. It doesn't seem to be a memory issue (it happened on a basically empty table).

adriangb avatar Feb 22 '24 17:02 adriangb