ksql icon indicating copy to clipboard operation
ksql copied to clipboard

Query upgrades: Changing the filter on a table can result in missed tombstones

Open big-andy-coates opened this issue 5 years ago • 10 comments

If a user uses CREATE OR REPLACE to add a filter to a table then existing rows that don't match the new filter are not deleted, even if they are later updated with another value that doesn't pass the filter.

Take, as an example, the following test case from filters.sql:

----------------------------------------------------------------------------------------------------
--@test: add filter to basic TABLE without filter
----------------------------------------------------------------------------------------------------
SET 'ksql.create.or.replace.enabled' = 'true';

CREATE TABLE a (id INT PRIMARY KEY, col1 INT) WITH (kafka_topic='a', value_format='JSON');
CREATE TABLE b AS SELECT * FROM a;

-- INSERT row [1 -> 0]
INSERT INTO a (id, col1) VALUES (1, 0);

ASSERT VALUES b (id, col1) VALUES (1, 0);

-- At this point the table contains a single row: [1 -> 0]
-- Note, this row would not pass the new filter being added.
-- No tombstone is emitted
CREATE OR REPLACE TABLE b AS SELECT * FROM a WHERE col1 > 0;

-- Even though the following insert does not pass the new filter
-- and the table contains a row with key `1`
-- No tombstone is emitted
INSERT INTO a (id, col1) VALUES (1, -1);

-- This assert fails!
ASSERT NULL VALUES b (id) KEY (1);

There are two points at which we could emit a tombstone, neither of which are:

  1. From the CREATE OR REPLACE statement. This would involve running the new filer against all entries currently in the materialized table. What to do if the table is not materialized?
  2. From the INSERT VALUES following the CREATE OR REPLACE statement. The issue here is probably that when theChange is passed through the streams topology and it hits the new filter, neither the old value nor the new value pass the filter, so no tombstone is emitted. (A tombstone is only emitted if the old value passes the filter and the new value does not).

To be correct, the CREATE OR REPLACE statement should cause a tombstone to be emitted. Though not sure how practical / obtainable this is, especially for none-materialized tables, e.g.

-- Neither FOO, nor BAR, need to be materialized.
CREATE TABLE FOO AS SELECT * FROM BAR;

It may be that we choose to live with this. That being the case we need to clearly document this limitations in whatever 'query upgrade' docs we have.

big-andy-coates avatar Oct 22 '20 15:10 big-andy-coates

Marking as milestone 0.14 as we should, at a minimum, at least call this out in the docs before 0.14 - assuming 0.14 is shipping with CREATE OR REPLACE enabled.

big-andy-coates avatar Oct 22 '20 15:10 big-andy-coates

Well, in general query migration is hard... Seems to be an incompatible query upgrade that we should reject until we can handle it correctly?

mjsax avatar Oct 22 '20 17:10 mjsax

To be correct, the CREATE OR REPLACE statement should cause a tombstone to be emitted. Though not sure how practical / obtainable this is, especially for none-materialized tables,

I actually disagree here - to be in line with the rest of query upgrades, which don't support backfill, this should happen on insert. For example, if I were to upgrade SELECT a FROM b to SELECT a + 1 FROM b I don't expect everything that was previously emitted to update. That would be equivalent to issuing a new query.

agavra avatar Oct 22 '20 17:10 agavra

Seems to be an incompatible query upgrade that we should reject until we can handle it correctly?

I disagree here as well 😈 There's an argument that it's not semantically incorrect. There's two different ways to look at a filter:

  1. ignore everything that doesn't match the filter
  2. apply the incoming event, and then remove it if it doesn't match the filter

I think the first interpretation is more straightforward. That would mean that, considering we explicitly don't support backfill, processing an event that doesn't match the filter shouldn't remove a previous value that doesn't match the filter.

Either way, this really only effects pull queries and joins (because older downstream apps have already seen the old value), so the surface area is somewhat limited.

agavra avatar Oct 22 '20 17:10 agavra

Well, but if you change the example, the row in the result could be deleted:

// after the upgrade
// Andy's example
INSERT INTO a (id, col1) VALUES (1, -1); // updates the input table to (1 -> 0) _AND_ does not delete the row in the result table
// more data
INSERT INTO a (id, col1) VALUES (1, 2); // updates the input _AND_ result table to (1 -> 2)
INSERT INTO a (id, col1) VALUES (1, -1); // updates the input table to (1 -> 0) _AND_ deletes the row in the result table

Thoughts?

mjsax avatar Oct 22 '20 18:10 mjsax

@mjsax dammit 😂

EDIT: in all seriousness though, I need to think about it a little bit more..

agavra avatar Oct 22 '20 18:10 agavra

Summary of quick offline discussion. I think agree that this is a bug given the most recent update that Matthias posted. The ideal semantic should be to look into the contents of the table to decide whether or not to emit a tombstone (practical concerns put aside for a bit).

The follow-up question is whether or not this bug renders the ability to add a filter to a table worse than not having it. I think we can explain this by saying "anything before the filter was added will not be affected" and that implies the behavior we've posted, but it could still be a little confusing (why is it that if I had an entry before that was valid under the new filter that a new invalid event would cause a tombstone whereas if I had an invalid event before a new invalid event doesn't cause a tombstone?)

agavra avatar Oct 22 '20 18:10 agavra

I'd agree it would be a shame to reject a request to add a filter just because it might fail to emit a tombstone.

My view would be to document the limitation first, and look to fix the issue by emitting tombstones later.

big-andy-coates avatar Oct 23 '20 10:10 big-andy-coates

My 2 cents: I think it makes sense to go forward with this as is and document the limitation. If it's too constraining, you can always do a full replay—the recommended solution up to now anyway.

MichaelDrogalis avatar Oct 28 '20 16:10 MichaelDrogalis

Thanks @MichaelDrogalis - it looks like we have a minor consensus. I'll update the docs in the 0.14 timeline and we can leave this ticket open after (I'll remove documentation label and 0.14 label) to track the underlying work to "fix" the issue.

agavra avatar Oct 28 '20 16:10 agavra