flow-go
flow-go copied to clipboard
[Storage] Refactor storage for Follower Engine
Working towards https://github.com/onflow/flow-go/issues/7242
This PR includes:
- Refactoring of the storage modules used by the consensus algorithm to store valid and finalized blocks.
- Replacing the existing Badger-based modules with a generic storage API that supports both Badger and Pebble backends.
- Changing the data storage approach from using Badger transactions to batch updates. Depending on the storage backend, these will be executed as either Badger or Pebble batch updates.
- Introducing an abstracted storage API to facilitate the transition from Badger to Pebble, allowing improved disk space management through data compression and pruning.
- Updating the protocol state storage modules for both consensus and non-consensus nodes, as they share significant logic—particularly for epoch-related data such as the epoch state machine.
- Transitioning database operations from transactions to batch updates while mitigating the risk of dirty reads. Locks have been introduced to address this, and @jordanschalm is preparing a follow-up PR for improvements.
To Reviewers:
- I've highlighted some key changes in comments, please read them first.
- Unfortunately, github can’t recognize the file moves from badger folder to the store folder, as well as many files in the storage/badger/operation folder. I tried to revert and add back, but still didn’t work. We have to clone the repo in a different folder, and compare the files on master in your local.
The test case is failing because this PR needs to be merged first, which fixes the failing tests.
This PR will not merge to master, instead, will be merged to a feature branch.
I will change the base branch into feature branch once it's approved.
Codecov Report
:x: Patch coverage is 39.02027% with 361 lines in your changes missing coverage. Please review.
:loudspeaker: Thoughts on this report? Let us know!
I created a PR which is just Leo's branch with a commit moving all storage/store files back to storage/badger: https://github.com/onflow/flow-go/pull/7382/files. This provides a diff for the storage files moved from storage/badger to storage/store if you filter the files (top-left) to "storage/badger". What I have been doing to review:
- Review files in this PR
- When encounter a storage file without a diff, open the corresponding file under
storage/badgerin https://github.com/onflow/flow-go/pull/7382 to see the file-wise diff - Mark both as viewed as continue in this PR
cc @zhangchiqing @AlexHentschel in case it helps your review. Just make sure that branch is up to date when you start reviewing.
This PR is based on master, but I thought the plan was to maintain a feature branch, do some testing on individual nodes from the feature branch, then merge the feature branch to master once we are more confident.
Yes, I updated the issue comments about the base branch. Once this PR is ready to merge, I will change the base branch to be feature branch, and merge there instead. For now, I keep the base branch as master, so that it's easy to keep the base branch up to date.
(https://github.com/onflow/flow-go/pull/7358 is also targeting master.)
I'd like to merge 7358 to master, the changes are minimal, are you comfortable merging #7358 to master and roll out before this PR? @jordanschalm
The integration tests (BFT Protocol Integration Tests) fails because the tests can't create multiple global lock managers for different follower engines. This is fixed in this PR
I am wondering if we can update the BatchStoreWithStoringResults method to avoid the storingResults map:
https://github.com/onflow/flow-go/blob/beed4f091a61a09c792f9bea0100a4d36ef1467b/storage/store/blocks.go#L46
I find the map storingResults compromises the modularity of the method.
- I generally find the design brittle and worry about potential bugs being introduced during future refactorings:
- The function takes a single block, but needs "promises" about other storage operations. That promise is made by the caller.
- I am not sure what happens if the map is nil? Will it panic on insertion here?. Though, from the caller's perspective, a
nilmap is a perfectly fine representation of an empty set. Its not documented that the map cannot be nil, even if it is empty - The map is modified by the recipient!!
- The documentation here is misleading I find: https://github.com/onflow/flow-go/blob/a4d33d6f731fc27a62f48f083545de3dea3ac0b6/storage/store/payloads.go#L51 If it was really just about storing multiple payloads, we could just change the method to take multiple payloads instead of one -- I would have suggested to generalize the method to multiple payloads. But that doesn't work, because there are execution results in the sealing segment stored outside of blocks.
Thoughts:
- We are stating: "BatchStoreWithStoringResults stores multiple blocks as a batch." However, the method only stores a single result. Are you referring to the anticipated call pattern? I generally prefer to avoid functions making assumptions about how it is being called.
- Bootstrapping is an action that happens before the node starts running. Atomicity requirements are much less strict while bootstrapping -- a degree of freedom we might be able to exploit
- Bootstrapping is a very rare case and special. I would prefer to not add "difficult" methods to our production APIs just for bootstrapping. If something more specialized is needed that needs to be more carefully used, I would prefer if we could put it in a separate package.
for discussion 👉 slack thread
Lets do this in a follow-up PR; added to issue https://github.com/onflow/flow-go/issues/7682
@AlexHentschel I believe all your comments have been addressed, if not by this PR, it's probably done in other PRs which I mentioned in the comments.
If you are done with reviewing this PR, I will rename this PR with Feature XXX, and you can continue reviewing other PRs built on top this.
I have also merge this branch into the next PR: https://github.com/onflow/flow-go/pull/7364