timescaledb
timescaledb copied to clipboard
Support 'ON CONFLICT DO NOTHING' for compressed chunks
What type of enhancement is this?
User experience
What subsystems and features will be improved?
Compression
What does the enhancement do?
When there are compressed chunks and you do INSERT ... ON CONFLICT DO NOTHING
, it says: "insert with ON CONFLICT or RETURNING clause is not supported on compressed chunks".
Currently, if you have an ingress system that processes duplicate values while back-dating, you can't use compression, even if you just want to ignore the duplicates.
It's a light-weight version of https://github.com/timescale/timescaledb/issues/3323: only supporting DO NOTHING
is fine when having unique keys. However, perhaps it's essentially the same problem, I don't know.
Implementation challenges
I tried patching the code as per attached patch (Or github compare (for as long as the link works)). The problem with it is that despite there being a unique index, duplicate values are inserted. I don't know if fixing this (checking for the value in the compressed index) is essentially the same as https://github.com/timescale/timescaledb/issues/3323, or that DO NOTHING
can more easily be implemented.
@wiebeytec This is essentially the same problem as 3323. We cannot support this enhancement without support for 3323.
Also really keen to see a fix for this.
As with many (most?) systems, I can't guarantee there will be no duplicates, so without this enhancement (more of a bug fix TBH, since it was treated as such in #100 for non-compressed hypertables) I need to check if rows exist before inserting them, which absolutely decimates performance 😭
also as discussed here on slack. Would love to see this feature.
Issue:
Consider below example table:
CREATE TABLE sample_table ( cpu double precision null, time TIMESTAMP WITH TIME ZONE NOT NULL, sensor_id INTEGER NOT NULL, name TEXT default 'this is a default string value', UNIQUE(sensor_id, time) );
-- create hypertable SELECT * FROM create_hypertable('sample_table', 'time', chunk_time_interval => INTERVAL '10 day');
-- populate hypertable INSERT INTO sample_table SELECT random() AS cpu, time + (INTERVAL '1 minute' * random()) AS time, sensor_id, 'very long string' FROM generate_series(now() - INTERVAL '1 months', now() - INTERVAL '1 week', INTERVAL '1 minute') AS g1(time), generate_series(1, 25, 1 ) AS g2(sensor_id) ORDER BY time;
-- show_chunks _timescaledb_internal._hyper_1_1_chunk _timescaledb_internal._hyper_1_2_chunk _timescaledb_internal._hyper_1_3_chunk
Above chunks are child tables of sample_table. These chunks inherit UNIQUE CONSTRAINT from sample_table.
--enable compression ALTER TABLE sample_table SET ( timescaledb.compress, timescaledb.compress_segmentby = 'sensor_id', timescaledb.compress_orderby = 'time' );
Enabling compression creates a new compressed table _timescaledb_internal._compressed_hypertable_2
\d _timescaledb_internal._compressed_hypertable_2 Table "_timescaledb_internal._compressed_hypertable_2" Column | Type | Collation | Nullable | Default -----------------------+---------------------------------------+-----------+----------+--------- cpu | _timescaledb_internal.compressed_data | | | time | _timescaledb_internal.compressed_data | | | sensor_id | integer | | | name | _timescaledb_internal.compressed_data | | | _ts_meta_count | integer | | | _ts_meta_sequence_num | integer | | | _ts_meta_min_1 | timestamp with time zone | | | _ts_meta_max_1 | timestamp with time zone | | | Indexes: "_compressed_hypertable_2_sensor_id__ts_meta_sequence_num_idx" btree (sensor_id, _ts_meta_sequence_num) Triggers: ts_insert_blocker BEFORE INSERT ON _timescaledb_internal._compressed_hypertable_2 FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Note that UNIQUE CONSTRAINT is not created for this new compressed table.
-- compress the chunks SELECT COMPRESS_CHUNK(i) FROM SHOW_CHUNKS('sample_table') i;
-- INSERT duplicates: INSERT into sample_table SELECT * FROM sample_table; ERROR: insert into a compressed chunk that has primary or unique constraint is not supported
-- INSERT duplicates WITH ON CONFLICT INSERT into sample_table SELECT * FROM sample_table ON CONFLICT DO NOTHING; ERROR: insert with ON CONFLICT or RETURNING clause is not supported on compressed chunks
If i comment the above error handling code, timescaledb ends up inserting duplicate rows. This is because there is no UNIQUE CONSTRAINT on the compressed table.
Questions:
- Should we copy the UNIQUE CONSTRAINT defined on base parent table, to compressed table and let compressed chunks to inherit the same constraint ? Challenge with this approach is that "time" column data type is _timescaledb_internal.compressed_data. This data type does not define any operator classes which defines about comparator operators.
- Is UNIQUE CONSTRAINT check performed after data is decompressed, or should we do binary data comparison of compressed data ?
I will clarify above questions with my team and further proceed on how to approach this problem.
Refer #5252 Will close this issue once 5252 is merged to main branch.
wow. cool. Super happy to be able to use this feature.
#5252 has been merged.