clickhouse-kafka-connect
clickhouse-kafka-connect copied to clipboard
Support Delete mode
@cwurm requested support for tombstone messages to delete records from the storage. The use case is essential for customers sharing the same message pipeline among several DB and expecting records to be removed from every destination.
Depends on lightweight deletes https://github.com/ClickHouse/ClickHouse/pull/42126
I've spent some time figuring out on how to handle this scenario until this feature is yet to be available and just wanna share my findings.
For those who uses debezium to read data from the source database there is an SMT available that adds a field "__deleted" for deleted records.
See for details: https://debezium.io/documentation/reference/stable/transformations/event-flattening.html
This field can be stored along with another fields into ReplacingMergeTree table.
After this a TTL logic can be added to the table which will vacuum clean all the records which are marked for deletion like following:
alter table database.table modify TTL timestamp + interval 1 hour where __deleted = 'true'