flow-go
flow-go copied to clipboard
[Access] Add support for pebbleDB to execution data tracker/pruner
Closes: #6260
Context
In this pull request:
- Split out
ExecutionDataTrackerDB code into commonExecutionDataTrackerinterface. - Refactored
badgerimplementation. - Added the
pebbleversion of the storage object. - Added functional and integration tests for pebble version of execution data pruning.
Codecov Report
Attention: Patch coverage is 39.39394% with 340 lines in your changes missing coverage. Please review.
Project coverage is 41.40%. Comparing base (
9653906) to head (3362d0b).
Additional details and impacted files
@@ Coverage Diff @@
## master #6277 +/- ##
==========================================
- Coverage 41.42% 41.40% -0.03%
==========================================
Files 2024 2032 +8
Lines 144439 144742 +303
==========================================
+ Hits 59839 59928 +89
- Misses 78403 78624 +221
+ Partials 6197 6190 -7
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 41.40% <39.39%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@zhangchiqing since you've been working closely with pebble/badger lately, can you help review this one. It's refactoring the execution data pruner to support both badger and pebble
Great work! @UlyanaAndrukhiv
Since we're adding both Pebble and Badger implementations for the execution data tracker, this PR has become quite extensive. I suggest we start with the Badger solution first, focusing on the implementation for the tracker and the pruning logic.
This PR also references the Pebble implementation I created in this pull request. However, that PR is still under review, and there's a high likelihood that the Pebble implementation will need refactoring. The patterns we're referring to here might become outdated, so it's better to wait and postpone the Pebble implementation for now.
We could initially implement the pruner in Badger, refactoring it to use Badger batch updates instead of transactions. Since Pebble doesn't support transactions, using Badger batch updates could make it easier for us to switch to the Pebble implementation later.
Design
We might need to revisit the design of the Tracker and Pruner. The original tracker and pruner were implemented a long time ago, and they face several challenges if we switch to Badger batch updates. We should take a step back and reconsider the design first.
For example: For each height, we index a list of CIDs as execution data. CIDs are nested, meaning a CID could have multiple children, and a child CID might have multiple different parent CIDs. When we prune a height, we can't just remove all the CIDs and their child CIDs indexed by that height, because some CID might be referenced by other CIDs at higher height. This is challenging because it requires extra information to determine if a CID or a child CID is prunable.
We tried to address this by keeping track of the highest indexed height for each CID (RetrieveTrackerLatestHeight / UpsertTrackerLatestHeight), but this introduces complexity, and I'm unsure if it's concurrency-safe without database transactions. Is it possible to eliminate the extra index to simplify things? If we keep the LatestHeight index for each CID, we need to be cautious about dirty writes that might corrupt data. For instance, while we're pruning a CID, we might also be concurrently indexing a new height with a certain CID referring to the deleted CID, which could corrupt the newly indexed data. We probably don't want to solve this problem by blocking indexing with a lock during pruning, because pruning might take a long time.
Therefore, I think we need to address these challenges before proceeding with the implementation.