risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

feat(storage): handle sync in flush controller

Open wenym1 opened this issue 2 years ago • 1 comments

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

In our current code, when we call clear_shared_buffer during recovery, there may exist some concurrent sync task running. The sync task will report its task result to sync_uncommitted_data, and if the data is cleared, they won't be able to report the task result and panic (#5353 #5392).

The root cause of the panic is that when we clear the shared buffer, we are not aware of any concurrent sync task, and therefore we have no way to wait for concurrent sync task to finish before we can safely clear the shared buffer.

In this PR, we handle the sync in the flush controller (previously buffer_tracker_worker) by spawning a concurrent tokio task and join the concurrent task when it finishes. When we handle a clear_shared_buffer, we will wait for all concurrent tasks to finish before we clear the shared buffer data.

Checklist

  • [x] I have written necessary rustdoc comments
  • [x] I have added necessary unit tests and integration tests
  • [x] All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Suppress the panic of #5353 and #5392 Partially finish #5394

wenym1 avatar Sep 19 '22 11:09 wenym1

Codecov Report

Merging #5430 (97bec9d) into main (5587547) will increase coverage by 0.01%. The diff coverage is 74.62%.

@@            Coverage Diff             @@
##             main    #5430      +/-   ##
==========================================
+ Coverage   74.26%   74.27%   +0.01%     
==========================================
  Files         906      907       +1     
  Lines      142634   142990     +356     
==========================================
+ Hits       105925   106207     +282     
- Misses      36709    36783      +74     
Flag Coverage Δ
rust 74.27% <74.62%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/src/hummock/mod.rs 92.63% <ø> (ø)
src/storage/src/hummock/shared_buffer/mod.rs 90.97% <ø> (ø)
src/storage/src/hummock/local_version.rs 82.05% <34.54%> (-1.74%) :arrow_down:
src/storage/src/hummock/local_version_manager.rs 68.69% <53.98%> (-3.78%) :arrow_down:
src/storage/src/hummock/upload_handle_manager.rs 97.02% <97.02%> (ø)
src/frontend/src/optimizer/plan_node/mod.rs 98.27% <100.00%> (+0.38%) :arrow_up:
...ge/hummock_test/src/local_version_manager_tests.rs 98.93% <100.00%> (+0.02%) :arrow_up:
src/source/src/row_id.rs 91.01% <0.00%> (-1.13%) :arrow_down:
.../src/executor/managed_state/aggregation/extreme.rs 95.38% <0.00%> (-0.14%) :arrow_down:
... and 2 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 20 '22 07:09 codecov[bot]