risingwave
risingwave copied to clipboard
refactor(meta): commit finish catalog in barrier manager
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Move committing metadata for finished stream jobs into barrier manager. This will unify the paths for recovered and newly created stream jobs.
Then we can let stream manager directly subscribe to catalog manager. And we no longer need the finished channel in our notifiers.
Before
struct Notifier {
pub started: Option<oneshot::Sender<BarrierInfo>>,
pub collected: Option<oneshot::Sender<MetaResult<()>>>,
pub finished: Option<oneshot::Sender<MetaResult<()>>>,
}
After
struct Notifier {
pub started: Option<oneshot::Sender<BarrierInfo>>,
pub collected: Option<oneshot::Sender<MetaResult<()>>>,
}
Then just wait for catalog to be committed / marked as finished.
This means that when run_command finishes, it means the initial barrier for the command is collected. With this info we can immediately commit the catalog, and make it visible to the fe at this point. This step will be done in a separate PR (making the catalog visible).
credit: @wenym1 for this idea and the discussions.
Checklist
- [ ] I have written necessary rustdoc comments
- [ ] I have added necessary unit tests and integration tests
- [ ] I have added test labels as necessary. See details.
- [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934).
- [ ] My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
- [ ] All checks passed in
./risedev check(or alias,./risedev c) - [ ] My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
- [ ] My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)
Documentation
- [ ] My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.
Still WIP. Can review the idea of the refactor and general implementation. There's still some optimizations (e.g. lots of clone used) and missing functionality, so please ignore these first.
(Still WIP, just mark ready for review to test).
With this info we can immediately commit the catalog, and make it visible to the fe at this point. This step will be done in a separate PR (making the catalog visible).
May I know the status after this PR? Do we notify the frontend after barrier collection or the creation finished? It appears that after https://github.com/risingwavelabs/risingwave/pull/17484 the frontend can resolve the streaming jobs that are still being created.
Also, is this task related to https://github.com/risingwavelabs/risingwave/issues/12771?
With this info we can immediately commit the catalog, and make it visible to the fe at this point. This step will be done in a separate PR (making the catalog visible).
May I know the status after this PR? Do we notify the frontend after barrier collection or the creation finished? It appears that after #17484 the frontend can resolve the streaming jobs that are still being created.
Also, is this task related to #12771?
Yeah it will be required by #12771.
May I know the status after this PR? Do we notify the frontend after barrier collection or the creation finished? It appears that after #17484 the frontend can resolve the streaming jobs that are still being created.
Nope we will still notify it immediately on committing the creating catalogs at the start, when preparing stream jobs, not after barrier collection. This PR does not change that.
@wenym1 will be doing further work on meta, as part of supporting logstore based backfill. So we cleanup some code first, to make it easier for him to do this work. The main goal is to cleanup the logic of finishing stream jobs, and centralize that all in barrier manager, and also cleanup some logic (e.g. notification channels) which were only used to monitor stream jobs, since these can be monitored by stream manager instead.
With this info we can immediately commit the catalog, and make it visible to the fe at this point. This step will be done in a separate PR (making the catalog visible).
May I know the status after this PR? Do we notify the frontend after barrier collection or the creation finished? It appears that after #17484 the frontend can resolve the streaming jobs that are still being created.
Also, is this task related to #12771?
I think the behavior to frontend is the same as before, which is to wait for streaming job finish before return from create_streaming_job. This PR only changes the way we get notified on the finish. It's an alternative implementation to https://github.com/risingwavelabs/risingwave/pull/17297.
@kwannoel Can you elaborate more on the motivation of the original PR?
@kwannoel Can you elaborate more on the motivation of the original PR?
It's originally intended for create MV on creating MV. But I'm not really prioritizing it recently, because it's not that simple to implement, and may be further complicated with logstore based backfill.
It can also cleanup some code in meta while doing so.