Query upgrades: Changing the filter on a table can result in missed tombstones
If a user uses CREATE OR REPLACE to add a filter to a table then existing rows that don't match the new filter are not deleted, even if they are later updated with another value that doesn't pass the filter.
Take, as an example, the following test case from filters.sql:
----------------------------------------------------------------------------------------------------
--@test: add filter to basic TABLE without filter
----------------------------------------------------------------------------------------------------
SET 'ksql.create.or.replace.enabled' = 'true';
CREATE TABLE a (id INT PRIMARY KEY, col1 INT) WITH (kafka_topic='a', value_format='JSON');
CREATE TABLE b AS SELECT * FROM a;
-- INSERT row [1 -> 0]
INSERT INTO a (id, col1) VALUES (1, 0);
ASSERT VALUES b (id, col1) VALUES (1, 0);
-- At this point the table contains a single row: [1 -> 0]
-- Note, this row would not pass the new filter being added.
-- No tombstone is emitted
CREATE OR REPLACE TABLE b AS SELECT * FROM a WHERE col1 > 0;
-- Even though the following insert does not pass the new filter
-- and the table contains a row with key `1`
-- No tombstone is emitted
INSERT INTO a (id, col1) VALUES (1, -1);
-- This assert fails!
ASSERT NULL VALUES b (id) KEY (1);
There are two points at which we could emit a tombstone, neither of which are:
- From the
CREATE OR REPLACEstatement. This would involve running the new filer against all entries currently in the materialized table. What to do if the table is not materialized? - From the
INSERT VALUESfollowing theCREATE OR REPLACEstatement. The issue here is probably that when theChangeis passed through the streams topology and it hits the new filter, neither the old value nor the new value pass the filter, so no tombstone is emitted. (A tombstone is only emitted if the old value passes the filter and the new value does not).
To be correct, the CREATE OR REPLACE statement should cause a tombstone to be emitted. Though not sure how practical / obtainable this is, especially for none-materialized tables, e.g.
-- Neither FOO, nor BAR, need to be materialized.
CREATE TABLE FOO AS SELECT * FROM BAR;
It may be that we choose to live with this. That being the case we need to clearly document this limitations in whatever 'query upgrade' docs we have.
Marking as milestone 0.14 as we should, at a minimum, at least call this out in the docs before 0.14 - assuming 0.14 is shipping with CREATE OR REPLACE enabled.
Well, in general query migration is hard... Seems to be an incompatible query upgrade that we should reject until we can handle it correctly?
To be correct, the CREATE OR REPLACE statement should cause a tombstone to be emitted. Though not sure how practical / obtainable this is, especially for none-materialized tables,
I actually disagree here - to be in line with the rest of query upgrades, which don't support backfill, this should happen on insert. For example, if I were to upgrade SELECT a FROM b to SELECT a + 1 FROM b I don't expect everything that was previously emitted to update. That would be equivalent to issuing a new query.
Seems to be an incompatible query upgrade that we should reject until we can handle it correctly?
I disagree here as well 😈 There's an argument that it's not semantically incorrect. There's two different ways to look at a filter:
- ignore everything that doesn't match the filter
- apply the incoming event, and then remove it if it doesn't match the filter
I think the first interpretation is more straightforward. That would mean that, considering we explicitly don't support backfill, processing an event that doesn't match the filter shouldn't remove a previous value that doesn't match the filter.
Either way, this really only effects pull queries and joins (because older downstream apps have already seen the old value), so the surface area is somewhat limited.
Well, but if you change the example, the row in the result could be deleted:
// after the upgrade
// Andy's example
INSERT INTO a (id, col1) VALUES (1, -1); // updates the input table to (1 -> 0) _AND_ does not delete the row in the result table
// more data
INSERT INTO a (id, col1) VALUES (1, 2); // updates the input _AND_ result table to (1 -> 2)
INSERT INTO a (id, col1) VALUES (1, -1); // updates the input table to (1 -> 0) _AND_ deletes the row in the result table
Thoughts?
@mjsax dammit 😂
EDIT: in all seriousness though, I need to think about it a little bit more..
Summary of quick offline discussion. I think agree that this is a bug given the most recent update that Matthias posted. The ideal semantic should be to look into the contents of the table to decide whether or not to emit a tombstone (practical concerns put aside for a bit).
The follow-up question is whether or not this bug renders the ability to add a filter to a table worse than not having it. I think we can explain this by saying "anything before the filter was added will not be affected" and that implies the behavior we've posted, but it could still be a little confusing (why is it that if I had an entry before that was valid under the new filter that a new invalid event would cause a tombstone whereas if I had an invalid event before a new invalid event doesn't cause a tombstone?)
I'd agree it would be a shame to reject a request to add a filter just because it might fail to emit a tombstone.
My view would be to document the limitation first, and look to fix the issue by emitting tombstones later.
My 2 cents: I think it makes sense to go forward with this as is and document the limitation. If it's too constraining, you can always do a full replay—the recommended solution up to now anyway.
Thanks @MichaelDrogalis - it looks like we have a minor consensus. I'll update the docs in the 0.14 timeline and we can leave this ticket open after (I'll remove documentation label and 0.14 label) to track the underlying work to "fix" the issue.