materialize
materialize copied to clipboard
storage: relax since check in read-only mode
Fixes #28885
With this change, we might now get panics in other places where we make assumptions about sinces/uppers that don't hold in read-only mode or when there are concurrent coordinator-like processes.
cc @benesch & @def-: As mentioned in #28885, this fixes the panic but we might discover new ones.
Motivation
Tips for reviewer
Checklist
- [ ] This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
- [ ] This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
- [ ] If this PR evolves an existing
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
- [ ] If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
Triggering nightly: https://buildkite.com/materialize/nightly/builds/9168 Edit: Second try: https://buildkite.com/materialize/nightly/builds/9174
How do we know this patch is consistent with everything else in the controller? This now admits creating a collection with a weird state (read frontier == [] and also without a dataflow installed). Ideally we shouldn't need to do this calculus. Strong preference that we return a "CollectionDeleted" and handle the fact that a collection was deleted in the adapter by either also applying the deletion or otherwise handle the fact that it is missing state updates there.
In a way, that is what we are currently doing with the panic in place:
- the
envdprocess panics - it's restarted
- on restart it reads latest catalog state, which doesn't contain the collection in question anymore
- bootstrap etc. proceeds succesfully
It would be a bigger task to change in-memory catalog state based on what the controller reports back, and in normal operations we don't want to do that: the catalog is the source of truth and then the controller doesn't concur with something that indicates a bug.
I still think since=[] is a valid state. It's weird, yes. But someone (some coordinator process) has allowed the since to go to empty, so here we are, and the controller can restart from that. It's just that it won't let anyone read from the collection, which is also correct. Plus, I think this scenario (empty since) can also happen once we allow concurrent adapter-like/coordinator-like processes, which we will in the future for use-case isolation.