timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

Support 'ON CONFLICT DO NOTHING' for compressed chunks

Open wiebeytec opened this issue 3 years ago • 4 comments

What type of enhancement is this?

User experience

What subsystems and features will be improved?

Compression

What does the enhancement do?

When there are compressed chunks and you do INSERT ... ON CONFLICT DO NOTHING, it says: "insert with ON CONFLICT or RETURNING clause is not supported on compressed chunks".

Currently, if you have an ingress system that processes duplicate values while back-dating, you can't use compression, even if you just want to ignore the duplicates.

It's a light-weight version of https://github.com/timescale/timescaledb/issues/3323: only supporting DO NOTHING is fine when having unique keys. However, perhaps it's essentially the same problem, I don't know.

Implementation challenges

I tried patching the code as per attached patch (Or github compare (for as long as the link works)). The problem with it is that despite there being a unique index, duplicate values are inserted. I don't know if fixing this (checking for the value in the compressed index) is essentially the same as https://github.com/timescale/timescaledb/issues/3323, or that DO NOTHING can more easily be implemented.

timescaledb-compressed-do-nothing-fix.patch.txt

wiebeytec avatar Feb 21 '22 08:02 wiebeytec

@wiebeytec This is essentially the same problem as 3323. We cannot support this enhancement without support for 3323.

gayyappan avatar Feb 21 '22 19:02 gayyappan

Also really keen to see a fix for this.

As with many (most?) systems, I can't guarantee there will be no duplicates, so without this enhancement (more of a bug fix TBH, since it was treated as such in #100 for non-compressed hypertables) I need to check if rows exist before inserting them, which absolutely decimates performance 😭

cocowalla avatar Jul 18 '22 13:07 cocowalla

also as discussed here on slack. Would love to see this feature.

WalkerWalker avatar Aug 11 '22 08:08 WalkerWalker

Issue:

Consider below example table:

CREATE TABLE sample_table ( cpu double precision null, time TIMESTAMP WITH TIME ZONE NOT NULL, sensor_id INTEGER NOT NULL, name TEXT default 'this is a default string value', UNIQUE(sensor_id, time) );

-- create hypertable SELECT * FROM create_hypertable('sample_table', 'time', chunk_time_interval => INTERVAL '10 day');

-- populate hypertable INSERT INTO sample_table SELECT random() AS cpu, time + (INTERVAL '1 minute' * random()) AS time, sensor_id, 'very long string' FROM generate_series(now() - INTERVAL '1 months', now() - INTERVAL '1 week', INTERVAL '1 minute') AS g1(time), generate_series(1, 25, 1 ) AS g2(sensor_id) ORDER BY time;

-- show_chunks _timescaledb_internal._hyper_1_1_chunk _timescaledb_internal._hyper_1_2_chunk _timescaledb_internal._hyper_1_3_chunk

Above chunks are child tables of sample_table. These chunks inherit UNIQUE CONSTRAINT from sample_table.

--enable compression ALTER TABLE sample_table SET ( timescaledb.compress, timescaledb.compress_segmentby = 'sensor_id', timescaledb.compress_orderby = 'time' );

Enabling compression creates a new compressed table _timescaledb_internal._compressed_hypertable_2

\d _timescaledb_internal._compressed_hypertable_2 Table "_timescaledb_internal._compressed_hypertable_2" Column | Type | Collation | Nullable | Default -----------------------+---------------------------------------+-----------+----------+--------- cpu | _timescaledb_internal.compressed_data | | | time | _timescaledb_internal.compressed_data | | | sensor_id | integer | | | name | _timescaledb_internal.compressed_data | | | _ts_meta_count | integer | | | _ts_meta_sequence_num | integer | | | _ts_meta_min_1 | timestamp with time zone | | | _ts_meta_max_1 | timestamp with time zone | | | Indexes: "_compressed_hypertable_2_sensor_id__ts_meta_sequence_num_idx" btree (sensor_id, _ts_meta_sequence_num) Triggers: ts_insert_blocker BEFORE INSERT ON _timescaledb_internal._compressed_hypertable_2 FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()

Note that UNIQUE CONSTRAINT is not created for this new compressed table.

-- compress the chunks SELECT COMPRESS_CHUNK(i) FROM SHOW_CHUNKS('sample_table') i;

-- INSERT duplicates: INSERT into sample_table SELECT * FROM sample_table; ERROR: insert into a compressed chunk that has primary or unique constraint is not supported

-- INSERT duplicates WITH ON CONFLICT INSERT into sample_table SELECT * FROM sample_table ON CONFLICT DO NOTHING; ERROR: insert with ON CONFLICT or RETURNING clause is not supported on compressed chunks

If i comment the above error handling code, timescaledb ends up inserting duplicate rows. This is because there is no UNIQUE CONSTRAINT on the compressed table.

Questions:

  1. Should we copy the UNIQUE CONSTRAINT defined on base parent table, to compressed table and let compressed chunks to inherit the same constraint ? Challenge with this approach is that "time" column data type is _timescaledb_internal.compressed_data. This data type does not define any operator classes which defines about comparator operators.
  2. Is UNIQUE CONSTRAINT check performed after data is decompressed, or should we do binary data comparison of compressed data ?

I will clarify above questions with my team and further proceed on how to approach this problem.

sb230132 avatar Aug 30 '22 04:08 sb230132

Refer #5252 Will close this issue once 5252 is merged to main branch.

sb230132 avatar Mar 06 '23 14:03 sb230132

wow. cool. Super happy to be able to use this feature.

WalkerWalker avatar Mar 06 '23 14:03 WalkerWalker

#5252 has been merged.

antekresic avatar Apr 06 '23 11:04 antekresic