risingwave
risingwave copied to clipboard
refactor(storage): shared_buffer_batch support Option notifier
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
as title
Please explain IN DETAIL what the changes are in this PR and why they are needed:
- shared_buffer_batch support Option notifier, in new
ReadVersion
struct , we gonna to useSharedBufferBatch
asImm
at first, and double write SharedBufferBatch toReadVersion
staging_data andSharedBuffer
for upload. SoReadVersion
no need to release buffer
Checklist
- ~[ ] I have written necessary rustdoc comments~
- ~[ ] I have added necessary unit tests and integration tests~
- [x] All checks passed in
./risedev check
(or alias,./risedev c
)
Refer to a related PR or issue link (optional)
Codecov Report
Merging #5534 (687fd14) into main (73a102e) will decrease coverage by
0.02%
. The diff coverage is59.42%
.
@@ Coverage Diff @@
## main #5534 +/- ##
==========================================
- Coverage 74.27% 74.24% -0.03%
==========================================
Files 905 906 +1
Lines 142600 142652 +52
==========================================
- Hits 105920 105917 -3
- Misses 36680 36735 +55
Flag | Coverage Δ | |
---|---|---|
rust | 74.24% <59.42%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/batch/src/executor/mod.rs | 72.30% <ø> (ø) |
|
src/batch/src/rpc/service/task_service.rs | 0.00% <0.00%> (ø) |
|
src/expr/src/vector_op/cast.rs | 87.32% <ø> (ø) |
|
src/frontend/src/handler/query.rs | 0.00% <0.00%> (ø) |
|
src/frontend/src/scheduler/distributed/query.rs | 72.29% <0.00%> (ø) |
|
...rontend/src/scheduler/distributed/query_manager.rs | 13.63% <0.00%> (ø) |
|
...frontend/src/scheduler/hummock_snapshot_manager.rs | 22.16% <0.00%> (ø) |
|
src/frontend/src/scheduler/local.rs | 0.00% <0.00%> (ø) |
|
src/meta/src/manager/catalog/fragment.rs | 27.51% <0.00%> (+0.04%) |
:arrow_up: |
src/meta/src/rpc/service/notification_service.rs | 15.62% <0.00%> (-0.34%) |
:arrow_down: |
... and 20 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Is there any reason why we support shared buffer batch without notifier?
IMO, each Imm payload should be associated with a notifier for notifying payload release. And both the Imm in ReadVersion and the uploader would share the same payload by Arc, so there is no need to introduce SharedBufferBatch without notifier.
Now we share the SharedBufferBatchInner by Arc
pub struct SharedBufferBatch {
inner: Arc<SharedBufferBatchInner>,
epoch: HummockEpoch,
compaction_group_id: CompactionGroupId,
pub table_id: u32,
}
So i think the ReadVersion
no need to handle the SharedBufferBatchInner
release (base on the design of SharedBufferBatch
struct) and host Buffer Release
to uploader, because ReadVersion
and uploader shared the same memory by SharedBufferBatchInner
. If we hold the notifier in ReadVersion
with SharedBufferBatch
, it may cause double release
Another reason is that i want to build a SharedBufferBatch
as Imm for ReadVersion
, so i need to enter a notifier, is there other way to double write SharedBufferBatch
with clone (like that use LocalVersionManager to build a SharedBufferBatch
, then double write to ReadVersion and uploader ( now LocalVersionManager) with clone)?
- build SharedBufferBatch by kv_pair
- insert into ReadVersion (with storage_core lock)
- insert into LocalVersionManager (with LocalVersion lock)
Neither ReadVersion nor uploader handles the buffer release. They are just holding a reference to the buffer. The buffer release is handled by reference counting, which is, only when no one is referencing the batch memory, we then release the memory. Note, besides ReadVersion and uploader, the HummockIterator returned in iter will also holding a reference.
Therefore, each SharedBufferBatchInner will be associated with a notifier when it gets created in new and notifies buffer release when dropped. Both ReadVersion and uploader are written with Arc<SharedBufferInner>, which is the double-write mechanism we mentioned. When no one is referencing the batch inner, we then release the memory and notify for buffer released. If we follow the current PR, we will lose track of some memory that has been cleared by the uploader but still used in ReadVersion or externally used in HummockIterator.