timescaledb-toolkit
timescaledb-toolkit copied to clipboard
ERROR: summary_ is null for toolkit_experimental.interpolated_delta with time_bucket_gapfill
Relevant system information:
- OS: Ubuntu 20.04.5 LTS
- PostgreSQL version (output of
SELECT version();): 14.6 - TimescaleDB Toolkit version (output of
\dx timescaledb_toolkitinpsql): 1.12.0 - Installation method: docker
Describe the bug
ERROR: summary_ is null for toolkit_experimental.interpolated_delta with time_bucket_gapfill
To Reproduce
CREATE TABLE data(ts timestamptz NOT NULL, val numeric);
SELECT create_hypertable('data', 'ts');
INSERT INTO data VALUES
('2022-01-01T00:00:00', 0),
('2022-01-01T00:01:00', 1),
('2022-01-01T00:02:00', 2),
('2022-01-01T00:03:00', 3),
('2022-01-01T00:04:00', 4),
('2022-01-01T00:05:00', 5),
('2022-01-01T00:06:00', 6),
('2022-01-01T00:07:00', 7),
('2022-01-01T00:08:00', 8),
('2022-01-01T00:09:00', 9),
('2022-01-01T00:10:00', 10),
('2022-01-01T00:11:00', 11),
('2022-01-01T00:12:00', 12);
SELECT
ts,
val
FROM data;
WITH buckets AS (
SELECT
id,
time_bucket_gapfill('15 min', ts) AS bucket,
avg(val) AS avg_val,
interpolate(avg(val)) AS interpolated_val,
counter_agg(ts, val) AS counter
FROM alert1
WHERE ts > '2022-01-01' AND ts < '2022-01-02'
GROUP BY id, bucket
ORDER BY id, bucket
)
SELECT
id,
bucket,
toolkit_experimental.interpolated_delta(
counter,
bucket,
'15 min',
lag(counter) OVER (PARTITION BY id ORDER BY bucket),
lead(counter) OVER (PARTITION BY id ORDER BY bucket)
) AS delta
FROM buckets;
Expected behavior Interpolated deltas across empty buckets.
Actual behavior
ERROR: summary_ is null
CONTEXT: extension/src/counter_agg.rs:628:1
Additional context
Iterates on https://github.com/timescale/timescaledb-toolkit/issues/626#issuecomment-1319747959
Hey @davetapley sorry I missed this, got lost in the holiday shuffle I think. Can you take a look at https://github.com/timescale/timescaledb-toolkit/issues/548 and see if that looks like a similar case? I think it does to me...
Hi @davidkohn88,
I also met this issue and didn't successfully resolved it.
Relevant system information:
- OS: MacOS 12.4 (running official pg12-latest)
- PostgreSQL version (output of
SELECT version();): 12.14 - TimescaleDB Toolkit version (output of
\dx timescaledb_toolkitinpsql): 1.15.0 - Installation method: docker
Describe the bug
When using interpolated_delta, I've got an error:
ERROR: summary_ is null
SQL state: XX000
To Reproduce Steps to reproduce the behavior:
- Run
docker run -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb-ha:pg12-latest - Connect to the database
- Execute the queries below:
CREATE TABLE test (
date timestamp with time zone NOT NULL,
quantity double precision
);
INSERT INTO test (date, quantity) VALUES
('2022-01-01T00:00:00.00+00:00', 10),
('2022-01-01T00:10:00.00+00:00', 11),
('2022-01-01T00:20:00.00+00:00', 12),
('2022-01-01T00:30:00.00+00:00', 16),
('2022-01-01T00:40:00.00+00:00', 27),
('2022-01-01T00:50:00.00+00:00', 0),
('2022-01-01T01:00:00.00+00:00', 3),
('2022-01-01T01:30:00.00+00:00', 5),
('2022-01-01T01:50:00.00+00:00', 1),
('2022-01-01T02:00:00.00+00:00', 6),
('2022-01-01T02:10:00.00+00:00', 7),
('2022-01-01T02:30:00.00+00:00', 8),
-- missing data between 2:30 and 3:30
-- I expect to interpolate consumption on 3:00
-- so I use time_bucket_gapfill to generate missing rows
('2022-01-01T03:30:00.00+00:00', 10);
WITH
gapfilling AS (
SELECT
time_bucket_gapfill('30 min'::interval, date) AS date,
counter_agg(date, quantity) as summary
FROM test
WHERE date BETWEEN '2022-01-01T00:00:00.00+00:00' AND '2022-01-01T04:00:00.00+00:00'
GROUP BY 1
)
SELECT
date,
interpolated_delta(
summary,
date,
'30 min',
LAG(summary) OVER (ORDER by date),
LEAD(summary) OVER (ORDER by date)
)
FROM gapfilling;
Expected behavior
Not sure actually 😅, ideally I'd want interpolation on missing data (where quantity is NULL).
Actual behavior An error is returned:
ERROR: summary_ is null
SQL state: XX000
I tried/adapted the workaround mentionned in #548, it avoids the error but doesn't allow to compute interpolation correctly:
WITH t1 as(
SELECT
time_bucket_gapfill('30 minute', date) AS date,
locf(last(quantity, date)) as locf_value,
last(quantity, date) as indicator_empty_bucket,
counter_agg(date, quantity) AS counter
FROM test
WHERE date BETWEEN '2022-01-01T00:00:00.00+00:00' AND '2022-01-01T04:00:00.00+00:00'
GROUP BY 1),
t2 as(
SELECT
date,
counter_agg(date, locf_value) AS counter
FROM t1
WHERE indicator_empty_bucket IS NULL
GROUP BY 1),
t3 as(
SELECT
date,
counter
FROM t1
WHERE indicator_empty_bucket IS NOT NULL
UNION ALL
SELECT date, counter FROM t2)
SELECT
date,
interpolated_delta(
counter,
date,
'30 minute',
LAG(counter) OVER (ORDER BY date),
LEAD(counter) OVER (ORDER BY date)
)
FROM t3 ORDER BY 1;
returns (with the test data of the code above):
| date | interpolated_data |
|---|---|
| 2022-01-01 00:00:00+00 | 6 |
| 2022-01-01 00:30:00+00 | 14 |
| 2022-01-01 01:00:00+00 | 2 |
| 2022-01-01 01:30:00+00 | 6 |
| 2022-01-01 02:00:00+00 | 2 |
| 2022-01-01 02:30:00+00 | 0 |
| 2022-01-01 03:00:00+00 | 2 - would expect 1 (consumption between 2:30 and 3:30 / 2) |
| 2022-01-01 03:30:00+00 | 0 - would expect 1 (consumption between 2:30 and 3:30 / 2) |
| 2022-01-01 04:00:00+00 | 0 |
I also tried to use a custom user-defined aggregate (LAST_IGNORE_NULLS), I got 1 on 3:00 but the interpolation is still wrong on rows where the quantity was originally NULL.
Am I doing something wrong? Thanks for helping me with this. 👍
I think I found a workaround by adapting the code suggested in #548 :
WITH t1 AS (
SELECT
time_bucket('30 min', date) AS date,
last(quantity, date) as quantity,
counter_agg(date, quantity) AS counter
FROM test
GROUP BY 1
),
t2 AS (
SELECT
date,
counter,
quantity,
LEAD(counter) OVER (order by date) as next
FROM t1
),
t3 AS (
SELECT
time_bucket_gapfill('30 min'::interval, date) AS date,
locf(last(counter, date)) AS prev,
locf(last(next, date)) AS next,
last(quantity, date) as quantity
FROM t2
WHERE date BETWEEN '2022-01-01T00:00:00.00+00:00' AND '2022-01-01T04:00:00.00+00:00'
GROUP BY 1
),
t4 AS (
SELECT
date,
last_val(prev) + extract(epoch from (date - last_time(prev))) * (first_val(next) - last_val(prev)) / extract(epoch from (first_time(next) - last_time(prev))) as quantity
FROM t3
WHERE quantity IS NULL
),
t5 AS (
SELECT date, counter
FROM t1
WHERE quantity IS NOT NULL
UNION ALL
SELECT date, counter_agg(date, quantity) AS counter FROM t4 GROUP BY date
)
SELECT date,
CASE WHEN COUNTER IS NOT NULL THEN interpolated_delta(
counter,
date,
'30 min',
LAG(counter) OVER (ORDER BY date),
LEAD(counter) OVER (ORDER BY date)
) ELSE NULL END
FROM t5
ORDER BY 1;