Tests failing with faster testdrive in Cluster Isolation Test & Postgres CDC Test
What version of Materialize are you using?
1b43edebf80d
What is the issue?
We have seen test failures in Cluster Isolation Test: https://buildkite.com/materialize/tests/builds/76472#018dd1cb-e448-4e85-850d-97c29c59562d
92:1: error: record 0 did not match
expected:
Record {
headers: [],
key: None,
value: Some(
{
(
"before",
Union {
index: 0,
inner: Null,
n_variants: 2,
null_variant: Some(
0,
),
},
),
(
"after",
Union {
index: 1,
inner: {
(
"c1",
Long(3),
),
},
n_variants: 2,
null_variant: Some(
0,
),
},
),
},
),
}
actual:
Record {
headers: [],
key: None,
value: Some(
{
(
"before",
Union {
index: 0,
inner: Null,
n_variants: 2,
null_variant: Some(
0,
),
},
),
(
"after",
Union {
index: 1,
inner: {
(
"c1",
Long(2),
),
},
n_variants: 2,
null_variant: Some(
0,
),
},
),
},
),
}
|
69 | URL '${testdrive ... [rest of line truncated for security]
91 |
92 | $ kafka-verify-data format=avro sink=materialize.public.sink1 sort-messages=true
| ^
Even waiting for a second before the kafka-verify-data still shows wrong data:
# Don't be too fast for kafka-verify-data
$ sleep-is-probably-flaky-i-have-justified-my-need-with-a-comment duration=1s
$ kafka-verify-data format=avro sink=materialize.public.sink1 sort-messages=true
{"before": null, "after": {"row":{"c1": 3}}}
So I'm not sure if this is actually just a test issue or a product bug?
Postgres CDC test also failed here: https://buildkite.com/materialize/tests/builds/76485#_ and seems possibly related:
status/04-drop-publication.td:26:1: error: expected error containing "publication \"mz_source\" does not exist", got "Source error: u386: incompatible schema change: source table t with oid 16386 has been altered"
|
10 | $ postgres-execute c ... [rest of line truncated for security]
25 |
26 | ! SELECT * FROM t;
| ^
When this has been figured out we should un-revert https://github.com/MaterializeInc/materialize/pull/25478
ci-regexp: cluster-isolation/mzcompose.py.*error: record 0 did not match
For the Cluster Isolation Test I'll try adding more sleep. Edit: That didn't work. I also can't reproduce the issue locally.
@sploiselle For Postgres CDC I'm not sure why the error would be different. Is that timing dependent?
@def- Ah--that error is timing dependent yes. The error you got is an error we expect to be overwritten eventually. It's possible that this is flaky in a way I didn't anticipate.
The same Pg CDC error just happened with the testdrive change reverted too: https://buildkite.com/materialize/tests/builds/76590#018dd6ba-75dc-446a-a1a6-bd8ac2f33175
The same Cluster Isolation Test error also just happened without faster testdrive: https://buildkite.com/materialize/tests/builds/77422#018e09fa-1b5f-468e-a570-8cccc8802cb4
And again: https://buildkite.com/materialize/tests/builds/77924 I'm not sure this is even a testing problem. Maybe it's an actual product issue. My workaround of sleeping longer didn't work. I'll ask the storage team to take a look.
Added to the storage mega tracker as a p2.