materialize
                                
                                 materialize copied to clipboard
                                
                                    materialize copied to clipboard
                            
                            
                            
                        Tests failing with faster testdrive in Cluster Isolation Test & Postgres CDC Test
What version of Materialize are you using?
1b43edebf80d
What is the issue?
We have seen test failures in Cluster Isolation Test: https://buildkite.com/materialize/tests/builds/76472#018dd1cb-e448-4e85-850d-97c29c59562d
92:1: error: record 0 did not match
expected:
Record {
    headers: [],
    key: None,
    value: Some(
        {
            (
                "before",
                Union {
                    index: 0,
                    inner: Null,
                    n_variants: 2,
                    null_variant: Some(
                        0,
                    ),
                },
            ),
            (
                "after",
                Union {
                    index: 1,
                    inner: {
                        (
                            "c1",
                            Long(3),
                        ),
                    },
                    n_variants: 2,
                    null_variant: Some(
                        0,
                    ),
                },
            ),
        },
    ),
}
actual:
Record {
    headers: [],
    key: None,
    value: Some(
        {
            (
                "before",
                Union {
                    index: 0,
                    inner: Null,
                    n_variants: 2,
                    null_variant: Some(
                        0,
                    ),
                },
            ),
            (
                "after",
                Union {
                    index: 1,
                    inner: {
                        (
                            "c1",
                            Long(2),
                        ),
                    },
                    n_variants: 2,
                    null_variant: Some(
                        0,
                    ),
                },
            ),
        },
    ),
}
     |
  69 |     URL '${testdrive ... [rest of line truncated for security]
  91 | 
  92 | $ kafka-verify-data format=avro sink=materialize.public.sink1 sort-messages=true
     | ^
Even waiting for a second before the kafka-verify-data still shows wrong data:
# Don't be too fast for kafka-verify-data
$ sleep-is-probably-flaky-i-have-justified-my-need-with-a-comment duration=1s
$ kafka-verify-data format=avro sink=materialize.public.sink1 sort-messages=true
{"before": null, "after": {"row":{"c1": 3}}}
So I'm not sure if this is actually just a test issue or a product bug?
Postgres CDC test also failed here: https://buildkite.com/materialize/tests/builds/76485#_ and seems possibly related:
status/04-drop-publication.td:26:1: error: expected error containing "publication \"mz_source\" does not exist", got "Source error: u386: incompatible schema change: source table t with oid 16386 has been altered"
     |
  10 | $ postgres-execute c ... [rest of line truncated for security]
  25 | 
  26 | ! SELECT * FROM t;
     | ^
When this has been figured out we should un-revert https://github.com/MaterializeInc/materialize/pull/25478
ci-regexp: cluster-isolation/mzcompose.py.*error: record 0 did not match
For the Cluster Isolation Test I'll try adding more sleep. Edit: That didn't work. I also can't reproduce the issue locally.
@sploiselle For Postgres CDC I'm not sure why the error would be different. Is that timing dependent?
@def- Ah--that error is timing dependent yes. The error you got is an error we expect to be overwritten eventually. It's possible that this is flaky in a way I didn't anticipate.
The same Pg CDC error just happened with the testdrive change reverted too: https://buildkite.com/materialize/tests/builds/76590#018dd6ba-75dc-446a-a1a6-bd8ac2f33175
The same Cluster Isolation Test error also just happened without faster testdrive: https://buildkite.com/materialize/tests/builds/77422#018e09fa-1b5f-468e-a570-8cccc8802cb4
And again: https://buildkite.com/materialize/tests/builds/77924 I'm not sure this is even a testing problem. Maybe it's an actual product issue. My workaround of sleeping longer didn't work. I'll ask the storage team to take a look.
Added to the storage mega tracker as a p2.