timescaledb-toolkit icon indicating copy to clipboard operation
timescaledb-toolkit copied to clipboard

Workaround for continuous aggregates not supporting time_bucket_gapfill

Open tlarrue opened this issue 2 years ago • 3 comments
trafficstars

Relevant system information:

  • PostgreSQL 14.6 (Ubuntu 14.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
  • TimescaleDB Toolkit version: 2.8.1
  • Installation method: Timescale Cloud

Describe the bug I receive the error message "Invalid continuous aggregate view" when trying to create a continuous aggregate that uses time_bucket_gapfill.

To Reproduce

Table Definition:

CREATE TABLE
  public.usage (
    session_id bigint NOT NULL,
    sim_profile_id bigint NOT NULL,
    sim_card_id character varying(43) NULL,
    org_id integer NULL,
    event_time timestamp without time zone NOT NULL,
    is_connected smallint NOT NULL,
    n_seconds double precision NOT NULL,
    zero_byte_session boolean NOT NULL,
    bytes bigint NOT NULL,
    bytes_per_second double precision NOT NULL,
    tadig character varying(5) NULL,
    country_iso3 character varying(3) NULL,
    rat character varying(10) NULL,
    lac bigint NULL,
    cid bigint NULL
  );

Query I would like to store as a continuous aggregate view:

WITH  

t1 as(
SELECT
   sim_card_id,
   time_bucket_gapfill('10 minute', event_time) AS time,
   locf(last(bytes_per_second, event_time)) as locf_value,
   last(bytes_per_second, event_time) as indicator_empty_bucket, 
   time_weight('locf', event_time, bytes_per_second) AS bps
FROM public.usage
Where event_time BETWEEN '2021-12-01 00:00' AND '2022-01-01 00:00'
GROUP BY 1,2), 

t2 as(
SELECT sim_card_id, time, time_weight('locf', time, locf_value) as bps
FROM t1 
WHERE indicator_empty_bucket IS NULL
GROUP BY 1, 2),

t3 as(
SELECT sim_card_id, time, bps FROM t1 WHERE indicator_empty_bucket IS NOT NULL
UNION ALL
SELECT sim_card_id, time, bps FROM t2)

SELECT
    sim_card_id,
    time,
    toolkit_experimental.interpolated_integral(
        bps,
        time,
        '10 minute',
        LAG(bps) OVER (Partition by sim_card_id ORDER BY time),
        LEAD(bps) OVER (Partition by sim_card_id ORDER BY time),
        'seconds'
    ) as bytes
FROM t3

Additional context Related issue with this query: https://github.com/timescale/timescaledb-toolkit/issues/548

tlarrue avatar Jan 12 '23 21:01 tlarrue

Thanks @tlarrue , this is expected behavior, but it'd be great to have gapfill support in continuous aggregates. I'm going to add a workaround here soon...it doesn't fix everything, but it will allow you to use continuous aggregates and we'll work on something better in the future.

davidkohn88 avatar Jan 13 '23 16:01 davidkohn88

Hey @davidkohn88, any updates on your side? :smile:

Timsgmlr avatar Jun 13 '23 15:06 Timsgmlr

The workaround unfortunately ended up being more complicated than we'd hoped and introduced some really hairy corner cases. At this point I'm afraid we don't have a good alternative to waiting for the fix to the underlying issue: https://github.com/timescale/timescaledb/issues/1324

WireBaron avatar Jun 15 '23 19:06 WireBaron