readyset
readyset copied to clipboard
Weak indexes can materialize duplicate rows
Consider the case where we have a node with a partial index on [0]
and a partial index on [1]
, and a weak index on [2]
and we have rows:
-
[1, "a", 3]
-
[1, "b", 3]
-
[1, "c", 3]
with a filled hole on [0] = [1]
and rows:
-
[1, "a", 3]
-
[2, "a", 3]
-
[3, "a", 3]
with a filled hole on [1] = ["a"]
The row [1, "a", 3]
is inserted into the weak index at [2] = [3]
twice.
This is relatively tricky (but possible - see https://github.com/readysettech/readyset/pull/287) to trigger with SQL today, but will get much more common once we bring back reuse
I was able to trigger this with SQL! The following logictest fails as of current main
:
statement ok
create table t1 (x int, y text)
statement ok
insert into t1 (x, y)
values
(1, 'a'),
(2, 'b'),
(3, 'c')
statement ok
create table t2 (x int, y text, z int)
statement ok
insert into t2 (x, y, z)
values
(1, 'a', 3),
(1, 'b', 3),
(1, 'c', 3),
(2, 'a', 3),
(3, 'a', 3);
onlyif readyset
statement ok
create cache from
select t1.x
from t2
join t1
on t1.x = t2.x
where t1.x = ?
onlyif readyset
statement ok
create cache from
select t1.x
from t2
join t1
on t1.y = t2.y
where t1.x = 1
onlyif readyset
statement ok
create cache from
select t1.x
from t2
join t1
on t1.x = t2.z
where t2.y = 'a'
query I nosort
select t1.x
from t2
join t1
on t1.x = t2.x
where t1.x = ?
? = 1
----
1
1
1
# Now we have a strict partial index on [0] with a filled hole on [0] = [1]
query I nosort
select t1.x
from t2
join t1
on t1.y = t2.y
where t1.x = 1
----
1
1
1
# Now we have a strict partial index on [1] with a filled hole on [1] = ['a']
query I nosort
select t1.x
from t2
join t1
on t1.x = t2.z
where t2.y = 'a'
----
3
3
3
# Now we have a weak index on [2]
graphviz
# (try to) do a regular write that does a weak lookup on [2] = [3]
statement ok
insert into t1 (x, y) values (3, 'c');
query I nosort
select t1.x
from t2
join t1
on t1.x = t2.z
where t2.y = 'a'
----
3
3
3
3
3
3
Repro'd by a logictest in #287
Wrote up some thoughts around this issue and a possible solution here: https://docs.google.com/document/d/1H0zPI18HPpthJNH91kPaErTYavYs8BXlCpfbNkaxFuk/edit
@lukoktonos I just realized this is in the backlog, and I think this should probably be added to the Critical Bugs for ER list and set to high priority, since it is likely easy for users to hit and can result in permanently wrong cache results. If you think it's fine, I can go ahead and make that change myself, but wanted to run it by you first since you're the coordinator for that project.
I have a fix in mind for this, and since I was looking for something small to work on today (since I'm out of office the rest of the week) I thought I'd pick it up, but then noticed it's not actually on any roadmaps or anything right now.
👍 Sounds good--thanks for catching that!
https://gerrit.readyset.name/c/readyset/+/5975 is a WIP CL associated with this ticket.
Arguably not just a WIP - I think that CL could be merged as-is if I recall correctly, it just wouldn't add any functionality on its own. It should be relatively straightforward to build the fix described in https://docs.google.com/document/d/1H0zPI18HPpthJNH91kPaErTYavYs8BXlCpfbNkaxFuk/edit off of that, though. Not sure if there's anything I missed when I came up with the idea for that fix, but the weak index stateful proptest I wrote up a while back might be useful in testing it.
Also, non-Gerrit link for the OSS community 😉 https://github.com/readysettech/readyset/pull/549