materialize
materialize copied to clipboard
testdrive: Try to make skipping consistency check be an error
@ParkMyCar I think something is wrong here, most testdrive-based tests I checked are spamming this warning (except the Testdrive test itself). Is it maybe necessary to have volumes_extra=["mzdata:/mzdata"], in every test?
I'll try defaulting it to that.
Checklist
- [ ] This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
- [ ] This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
- [ ] If this PR evolves an existing
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
- [ ] If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
I'll try defaulting it to that.
It didn't work, so I'm out of ideas for why this isn't working. Do you have any idea?
Interesting! Which warning is getting spammed, the "system_parameter_defaults" one or the "Catalog state on disk"?
For "system_parameter_defaults" it's possible the format of the Catalog's state dump has changed and this code needs to be updated. For the "Catalog state on disk" I'm going to guess the catalog_config option isn't getting set. Maybe we need to thread through some new CLI args or plumb existing ones to this config.
@ParkMyCar It's the "Catalog state on disk". I'm now enabling the config to always exist, but that is causing a lot of strange looking timeouts: https://buildkite.com/materialize/test/builds/92560
Rebased, new run: https://buildkite.com/materialize/test/builds/93532 Edit: Still seeing timeouts. There is a very interesting panic:
thread 'main' panicked at src/catalog/src/durable/persist.rs:1194:63:
kind decoding error: "missing field `extra_versions`"
I guess we'll need some help from the Persist team, I'll ask in their Slack channel.
There is a very interesting panic:
thread 'main' panicked at src/catalog/src/durable/persist.rs:1194:63: kind decoding error: "missing field `extra_versions`"
Ahhh this is interesting, it looks like the panic is coming from the Legacy Upgrade tests? I think what's happening here is we start an older version of Materialize, run some workflow, then before upgrading to a newer version of MZ we run the consistency checks. But because the consistency checks are run via the testdrive runner we end up using the latest version of the Catalog protobufs which fails because we only run the Catalog migrations if we're not in read-only.
We could either make the consistency checks version aware, and skip if the the current MZ version != the version of the testdrive runner, or just skip them for the legacy upgrade tests
Ok, totally fine to skip them for the upgrade tests then. I'm more worried about the timeouts since I have no workaround for them.