risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

refactor(storage): shared_buffer_batch support Option notifier

Open Li0k opened this issue 2 years ago • 1 comments

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

as title

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • shared_buffer_batch support Option notifier, in new ReadVersion struct , we gonna to use SharedBufferBatch as Imm at first, and double write SharedBufferBatch to ReadVersion staging_data and SharedBuffer for upload. So ReadVersion no need to release buffer

Checklist

  • ~[ ] I have written necessary rustdoc comments~
  • ~[ ] I have added necessary unit tests and integration tests~
  • [x] All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Li0k avatar Sep 23 '22 13:09 Li0k

Codecov Report

Merging #5534 (687fd14) into main (73a102e) will decrease coverage by 0.02%. The diff coverage is 59.42%.

@@            Coverage Diff             @@
##             main    #5534      +/-   ##
==========================================
- Coverage   74.27%   74.24%   -0.03%     
==========================================
  Files         905      906       +1     
  Lines      142600   142652      +52     
==========================================
- Hits       105920   105917       -3     
- Misses      36680    36735      +55     
Flag Coverage Δ
rust 74.24% <59.42%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/mod.rs 72.30% <ø> (ø)
src/batch/src/rpc/service/task_service.rs 0.00% <0.00%> (ø)
src/expr/src/vector_op/cast.rs 87.32% <ø> (ø)
src/frontend/src/handler/query.rs 0.00% <0.00%> (ø)
src/frontend/src/scheduler/distributed/query.rs 72.29% <0.00%> (ø)
...rontend/src/scheduler/distributed/query_manager.rs 13.63% <0.00%> (ø)
...frontend/src/scheduler/hummock_snapshot_manager.rs 22.16% <0.00%> (ø)
src/frontend/src/scheduler/local.rs 0.00% <0.00%> (ø)
src/meta/src/manager/catalog/fragment.rs 27.51% <0.00%> (+0.04%) :arrow_up:
src/meta/src/rpc/service/notification_service.rs 15.62% <0.00%> (-0.34%) :arrow_down:
... and 20 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 23 '22 13:09 codecov[bot]

Is there any reason why we support shared buffer batch without notifier?

IMO, each Imm payload should be associated with a notifier for notifying payload release. And both the Imm in ReadVersion and the uploader would share the same payload by Arc, so there is no need to introduce SharedBufferBatch without notifier.

Now we share the SharedBufferBatchInner by Arc

pub struct SharedBufferBatch {
    inner: Arc<SharedBufferBatchInner>,
    epoch: HummockEpoch,
    compaction_group_id: CompactionGroupId,
    pub table_id: u32,
}

So i think the ReadVersion no need to handle the SharedBufferBatchInner release (base on the design of SharedBufferBatch struct) and host Buffer Release to uploader, because ReadVersion and uploader shared the same memory by SharedBufferBatchInner. If we hold the notifier in ReadVersion with SharedBufferBatch, it may cause double release

Another reason is that i want to build a SharedBufferBatch as Imm for ReadVersion, so i need to enter a notifier, is there other way to double write SharedBufferBatch with clone (like that use LocalVersionManager to build a SharedBufferBatch, then double write to ReadVersion and uploader ( now LocalVersionManager) with clone)?

  1. build SharedBufferBatch by kv_pair
  2. insert into ReadVersion (with storage_core lock)
  3. insert into LocalVersionManager (with LocalVersion lock)

Li0k avatar Sep 23 '22 21:09 Li0k

Neither ReadVersion nor uploader handles the buffer release. They are just holding a reference to the buffer. The buffer release is handled by reference counting, which is, only when no one is referencing the batch memory, we then release the memory. Note, besides ReadVersion and uploader, the HummockIterator returned in iter will also holding a reference.

Therefore, each SharedBufferBatchInner will be associated with a notifier when it gets created in new and notifies buffer release when dropped. Both ReadVersion and uploader are written with Arc<SharedBufferInner>, which is the double-write mechanism we mentioned. When no one is referencing the batch inner, we then release the memory and notify for buffer released. If we follow the current PR, we will lose track of some memory that has been cleared by the uploader but still used in ReadVersion or externally used in HummockIterator.

wenym1 avatar Sep 24 '22 02:09 wenym1