risingwave
risingwave copied to clipboard
feat(snapshot-backfill): only receive mutation from barrier worker for snapshot backfill
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
During snapshot backfill, we control the behavior via barrier mutation. Snapshot backfill cares about two mutation: the initial add mutation to get the log store subscriber info, and the drop subscription mutation to get notified on starting consuming upstream.
In snapshot backfill executor, we receive barriers from both upstream and local barrier worker. In current main branch, we use the mutations of the barriers received from upstream. Under this design, we will have to introduce the hacky pre-sync mutation mechanism, without which when we are still under small fake epoch when consuming upstream snapshot, we won't be able to get the mutation of upstream barrier, because the upstream barrier has later epoch than the latest injected epoch of the backfilling partial graph.
To get rid of this pre-sync mutation mechanism, we should first change to use the mutation of barriers received from local barrier worker, which is what we did in this PR. The initial add mutation is changed to be sent from the first fake barrier rather than the first upstream barrier.
The implementation of UpstreamBuffer
is also changed. UpstreamBuffer
should be aware of the drop subscription mutation to stop consuming upstream. Since previously the mutation is carried in the upstream barrier, the UpstreamBuffer
only need to receive barrier from upstream. However, since we are going to erase the mutation of barrier received from upstream, the UpstreamBuffer
now also need to take the barrier receiver from local barrier worker.
In the phase of consuming snapshot, the UpstreamBuffer
will receive upstream data with no fear, because it's not likely to get notified about the finish of backfill in this phase, and it won't be responsible for receiving barrier from local barrier worker yet. When we enter the phase that consumes log store, we should be aware of the barrier mutation, and therefore the UpstreamBuffer
starts receiving barriers from local barrier worker and always check the mutation to decide whether to stop consuming upstream.
Checklist
- [ ] I have written necessary rustdoc comments
- [ ] I have added necessary unit tests and integration tests
- [ ] I have added test labels as necessary. See details.
- [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934).
- [ ] My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
- [x] All checks passed in
./risedev check
(or alias,./risedev c
) - [ ] My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
- [ ] My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)
Documentation
- [ ] My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.