ksql
ksql copied to clipboard
Support DELETE FROM or INSERT TOMBSTONE for deleting individual records
Please add a straightforward way to insert tombstone null values into a table. There is only a workaround available in ksqldb 0.14.
This is necessary for the ksqldb REST API. In order to "delete" messages from a table in the following manner (pseudo code):
INSERT TOMBSTONE INTO {table-name} (Id) VALUES (1);
The key can be also compound. INSERT TOMBSTONE INTO {table-name} (IdPart1, IdPart2) VALUES (1,2);
It's even not always possible to create a "dummy" stream for deletes in a 3rd party db. thx
Thanks @tomasfabian for submitting this ticket! We looked into this when we first introduced INSERT VALUES
and considered adding DELETE FROM
(standard SQL syntax) at the same time. We decided against it because DELETE FROM
supports wide deletions (you can delete with an arbitrary constraint) and that would be impossible to implement for a table that isn't materialized.
Since then, many things have changed. It might make sense to revisit this and support DELETE FROM
for tables.
Should this be a streaming-engine
ticket instead of query-engine
?
Has there been any recent movement on this? There are a couple reasons this has come up for me. Context is that we're using the known workaround by using a STREAM to generate a NULL value for the key based on the deletion criteria.
- We're running up against the limit of 40 persistent queries on a given ksqlDB cluster because of the need to have 2 persistent queries for a given source topic to filter out these deletions.
- We recently ran into a race condition where the deleted stream produced a message from to the sink\destination topic before the normal ksqlDB table produced the message to the topic. So the tombstone showed up in the topic before the original record, so the deletion never happened.
I should be able to find a workaround to address #2, but #1 is my biggest concern given that for every 20 tables that we stream, we have to generate a new cluster, which results in higher costs to operate with ksqlDB. I'll also note that our query saturation and disk usage of our ksqlDB cluster is consistently minimal, so resources aren't even a concern at this point.