timescaledb-toolkit icon indicating copy to clipboard operation
timescaledb-toolkit copied to clipboard

ERROR: summary_ is null for toolkit_experimental.interpolated_delta with time_bucket_gapfill

Open davetapley opened this issue 2 years ago • 3 comments
trafficstars

Relevant system information:

  • OS: Ubuntu 20.04.5 LTS
  • PostgreSQL version (output of SELECT version();): 14.6
  • TimescaleDB Toolkit version (output of \dx timescaledb_toolkit in psql): 1.12.0
  • Installation method: docker

Describe the bug

ERROR: summary_ is null for toolkit_experimental.interpolated_delta with time_bucket_gapfill

To Reproduce

CREATE TABLE data(ts timestamptz NOT NULL, val numeric);

SELECT create_hypertable('data', 'ts');

INSERT INTO data VALUES
('2022-01-01T00:00:00', 0),
('2022-01-01T00:01:00', 1),
('2022-01-01T00:02:00', 2),
('2022-01-01T00:03:00', 3),
('2022-01-01T00:04:00', 4),
('2022-01-01T00:05:00', 5),
('2022-01-01T00:06:00', 6),
('2022-01-01T00:07:00', 7),
('2022-01-01T00:08:00', 8),
('2022-01-01T00:09:00', 9),
('2022-01-01T00:10:00', 10),
('2022-01-01T00:11:00', 11),
('2022-01-01T00:12:00', 12);

SELECT
    ts,
    val
FROM data;


WITH buckets AS (
    SELECT
        id,
        time_bucket_gapfill('15 min', ts) AS bucket,
        avg(val) AS avg_val,
        interpolate(avg(val)) AS interpolated_val,
        counter_agg(ts, val) AS counter

    FROM alert1
    WHERE ts > '2022-01-01' AND ts < '2022-01-02'
    GROUP BY id, bucket
    ORDER BY id, bucket
)

SELECT
    id,
    bucket,
    toolkit_experimental.interpolated_delta(
        counter,
        bucket,
        '15 min',
        lag(counter) OVER (PARTITION BY id ORDER BY bucket),
        lead(counter) OVER (PARTITION BY id ORDER BY bucket)
    ) AS delta
FROM buckets;

Expected behavior Interpolated deltas across empty buckets.

Actual behavior

ERROR:  summary_ is null
CONTEXT:  extension/src/counter_agg.rs:628:1

Additional context

Iterates on https://github.com/timescale/timescaledb-toolkit/issues/626#issuecomment-1319747959

davetapley avatar Nov 22 '22 23:11 davetapley

Hey @davetapley sorry I missed this, got lost in the holiday shuffle I think. Can you take a look at https://github.com/timescale/timescaledb-toolkit/issues/548 and see if that looks like a similar case? I think it does to me...

davidkohn88 avatar Jan 12 '23 17:01 davidkohn88

Hi @davidkohn88,

I also met this issue and didn't successfully resolved it.

Relevant system information:

  • OS: MacOS 12.4 (running official pg12-latest)
  • PostgreSQL version (output of SELECT version();): 12.14
  • TimescaleDB Toolkit version (output of \dx timescaledb_toolkit in psql): 1.15.0
  • Installation method: docker

Describe the bug

When using interpolated_delta, I've got an error:

ERROR:  summary_ is null
SQL state: XX000

To Reproduce Steps to reproduce the behavior:

  1. Run docker run -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb-ha:pg12-latest
  2. Connect to the database
  3. Execute the queries below:
CREATE TABLE test (
  date timestamp with time zone NOT NULL,
  quantity double precision 
);

INSERT INTO test (date, quantity) VALUES
	('2022-01-01T00:00:00.00+00:00', 10),
	('2022-01-01T00:10:00.00+00:00', 11),
	('2022-01-01T00:20:00.00+00:00', 12),
	('2022-01-01T00:30:00.00+00:00', 16),
	('2022-01-01T00:40:00.00+00:00', 27),
	('2022-01-01T00:50:00.00+00:00', 0),
	('2022-01-01T01:00:00.00+00:00', 3),
	('2022-01-01T01:30:00.00+00:00', 5),
	('2022-01-01T01:50:00.00+00:00', 1),
	('2022-01-01T02:00:00.00+00:00', 6),
	('2022-01-01T02:10:00.00+00:00', 7),
	('2022-01-01T02:30:00.00+00:00', 8),
	-- missing data between 2:30 and 3:30
	-- I expect to interpolate consumption on 3:00
	-- so I use time_bucket_gapfill to generate missing rows
	('2022-01-01T03:30:00.00+00:00', 10);
	
WITH
gapfilling AS (
    SELECT
        time_bucket_gapfill('30 min'::interval, date) AS date,
	    counter_agg(date, quantity) as summary
    FROM test
	WHERE date BETWEEN '2022-01-01T00:00:00.00+00:00' AND '2022-01-01T04:00:00.00+00:00'
    GROUP BY 1
)
SELECT
    date,
    interpolated_delta(
        summary,
        date,
        '30 min',
        LAG(summary) OVER (ORDER by date),
        LEAD(summary) OVER (ORDER by date)
    )
FROM gapfilling;

Expected behavior Not sure actually 😅, ideally I'd want interpolation on missing data (where quantity is NULL).

Actual behavior An error is returned:

ERROR:  summary_ is null
SQL state: XX000

I tried/adapted the workaround mentionned in #548, it avoids the error but doesn't allow to compute interpolation correctly:

WITH  t1 as(

SELECT
   time_bucket_gapfill('30 minute', date) AS date,
   locf(last(quantity, date)) as locf_value,
   last(quantity, date) as indicator_empty_bucket, 
   counter_agg(date, quantity) AS counter
FROM test
WHERE date BETWEEN '2022-01-01T00:00:00.00+00:00' AND '2022-01-01T04:00:00.00+00:00'
GROUP BY 1), 

t2 as(

SELECT 
    date, 
    counter_agg(date, locf_value) AS counter
FROM t1 
WHERE indicator_empty_bucket IS NULL
GROUP BY 1),

t3 as(
SELECT
   date, 
   counter
FROM t1 
WHERE indicator_empty_bucket IS NOT NULL
UNION ALL
SELECT date, counter FROM t2)
SELECT
    date,
    interpolated_delta(
        counter,
        date,
        '30 minute',
        LAG(counter) OVER (ORDER BY date),
        LEAD(counter) OVER (ORDER BY date)
    )
FROM t3 ORDER BY 1;

returns (with the test data of the code above):

date interpolated_data
2022-01-01 00:00:00+00 6
2022-01-01 00:30:00+00 14
2022-01-01 01:00:00+00 2
2022-01-01 01:30:00+00 6
2022-01-01 02:00:00+00 2
2022-01-01 02:30:00+00 0
2022-01-01 03:00:00+00 2 - would expect 1 (consumption between 2:30 and 3:30 / 2)
2022-01-01 03:30:00+00 0 - would expect 1 (consumption between 2:30 and 3:30 / 2)
2022-01-01 04:00:00+00 0

I also tried to use a custom user-defined aggregate (LAST_IGNORE_NULLS), I got 1 on 3:00 but the interpolation is still wrong on rows where the quantity was originally NULL.

Am I doing something wrong? Thanks for helping me with this. 👍

jledentu avatar Apr 07 '23 12:04 jledentu

I think I found a workaround by adapting the code suggested in #548 :

WITH  t1 AS (

SELECT
   time_bucket('30 min', date) AS date,
   last(quantity, date) as quantity,
   counter_agg(date, quantity) AS counter
FROM test
GROUP BY 1

),

t2 AS (

SELECT
    date,
    counter,
    quantity,
    LEAD(counter) OVER (order by date) as next
FROM t1

),

t3 AS (

SELECT
	time_bucket_gapfill('30 min'::interval, date) AS date,
	locf(last(counter, date)) AS prev,
	locf(last(next, date)) AS next,
	last(quantity, date) as quantity
FROM t2
WHERE date BETWEEN '2022-01-01T00:00:00.00+00:00' AND '2022-01-01T04:00:00.00+00:00'
GROUP BY 1

),

t4 AS (

SELECT
	date,
	last_val(prev) + extract(epoch from (date - last_time(prev))) * (first_val(next) - last_val(prev)) / extract(epoch from (first_time(next) - last_time(prev))) as quantity
FROM t3
WHERE quantity IS NULL

),

t5 AS (

SELECT date, counter
FROM t1 
WHERE quantity IS NOT NULL
UNION ALL
SELECT date, counter_agg(date, quantity) AS counter FROM t4 GROUP BY date
	
)

SELECT date,
    CASE WHEN COUNTER IS NOT NULL THEN interpolated_delta(
        counter,
        date,
        '30 min',
        LAG(counter) OVER (ORDER BY date),
        LEAD(counter) OVER (ORDER BY date)
    ) ELSE NULL END 
FROM t5
ORDER BY 1;

jledentu avatar Apr 07 '23 15:04 jledentu