Update Rollups example script
Assume we run the cron jobs on an hour basis, the following will happen:
| timeline | runs_at | last_rollup_time | curr_rollup_time | what happens |
|---|---|---|---|---|
| the Nth cron job runs | 2021-01-05 20:00:00 | 2021-01-05 19:00:00 | 2021-01-05 20:00:00 | hours included [20] |
| new http_requests are created at 2021-01-05 20:30:00, let's say http_requests (A and B) | ||||
| the (N+1)th cron job | 2021-01-05 21:00:00 | 2021-01-05 20:00:00 | 2021-01-05 21:00:00 | hours included [21] |
Events A and B will not be added to the rollup table. To confirm:
> select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 19:00:00'::timestamp, '2021-01-05 20:00:00'::timestamp, '(]');
> true
Events A, and B won't be added here although their ingest_times are within the range, because they are not in the citus DB yet, they are created at 20:30:00 and the job is running at 20:00:00. That's 30 minutes in between.
> select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 20:00:00'::timestamp, '2021-01-05 21:00:00'::timestamp, '(]');
> false
Events A, and B won't be added here because their ingest_times are not within the range.
Which means that http_requests A and B will be lost.
I think the issue can be fixed if we use the timestamp itself without any truncation in the where clause, so instead of using date_trunc('minute', ingest_time) we just use ingest_time.
Hi @M-Sayed, is the method you're proposing preferable to that in #943? I just merged that older PR, but we can switch to use your suggestion instead if you think that's best.