citus_docs icon indicating copy to clipboard operation
citus_docs copied to clipboard

Update Rollups example script

Open M-Sayed opened this issue 4 years ago • 2 comments

Assume we run the cron jobs on an hour basis, the following will happen:

timeline runs_at last_rollup_time curr_rollup_time what happens
the Nth cron job runs 2021-01-05 20:00:00 2021-01-05 19:00:00 2021-01-05 20:00:00 hours included [20]
new http_requests are created at 2021-01-05 20:30:00, let's say http_requests (A and B)
the (N+1)th cron job 2021-01-05 21:00:00 2021-01-05 20:00:00 2021-01-05 21:00:00 hours included [21]

Events A and B will not be added to the rollup table. To confirm:

> select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 19:00:00'::timestamp, '2021-01-05 20:00:00'::timestamp, '(]');
> true

Events A, and B won't be added here although their ingest_times are within the range, because they are not in the citus DB yet, they are created at 20:30:00 and the job is running at 20:00:00. That's 30 minutes in between.

> select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 20:00:00'::timestamp, '2021-01-05 21:00:00'::timestamp, '(]');
> false

Events A, and B won't be added here because their ingest_times are not within the range.

Which means that http_requests A and B will be lost.

I think the issue can be fixed if we use the timestamp itself without any truncation in the where clause, so instead of using date_trunc('minute', ingest_time) we just use ingest_time.

M-Sayed avatar Oct 27 '21 13:10 M-Sayed

Hi @M-Sayed, is the method you're proposing preferable to that in #943? I just merged that older PR, but we can switch to use your suggestion instead if you think that's best.

jonels-msft avatar Oct 07 '22 19:10 jonels-msft