flow-go
flow-go copied to clipboard
Checkpointing V6 - support concurrent checkpoint encoding/decoding
Closes #3075
This PR implements the checkpointing V6.
Checkpointing Version 6 splits the single checkpoint file into 18 files in total. The main benefits are:
- The benefit of splitting the checkpoint file is to support concurrent writes to multiple sub files which speeds up checkpoint generation, and concurrent reads which speeds up reading checkpoint.
- V6 is benefited from V5, where it builds the sub trees first to be encoded, which built the ground for allowing concurrent processing.
See complete design in this doc: https://www.notion.so/dapperlabs/Checkpoint-V6-8c7b97937da54c5b9e6c18b5b4598f2e
Comparison between V5 and V6 using latest mainnet19 data snapshot:
- checkpoint writing is reduced from
16minsto3mins, 5.3 times faster - checkpoint reading is reduced from
12minsto2mins, 6 times faster
This is a feature branch. There are more TODO items to be done in separate PRs. Once this PR is approved, I will close this PR until all TODO items are done, then I will re-open this PR, and merge to master.
Codecov Report
Merging #3273 (40e1658) into master (54840e4) will increase coverage by
0.31%. The diff coverage is63.68%.
@@ Coverage Diff @@
## master #3273 +/- ##
==========================================
+ Coverage 55.28% 55.59% +0.31%
==========================================
Files 744 754 +10
Lines 67865 69509 +1644
==========================================
+ Hits 37521 38646 +1125
- Misses 27293 27683 +390
- Partials 3051 3180 +129
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 55.59% <63.68%> (+0.31%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| admin/commands/storage/helper.go | 78.00% <ø> (ø) |
|
| admin/commands/storage/read_results.go | 72.60% <0.00%> (ø) |
|
| admin/commands/storage/read_seals.go | 22.32% <0.00%> (ø) |
|
| admin/errors.go | 0.00% <0.00%> (ø) |
|
| cmd/execution_builder.go | 0.00% <0.00%> (ø) |
|
| cmd/execution_config.go | 0.00% <0.00%> (ø) |
|
| cmd/scaffold.go | 18.05% <0.00%> (-0.10%) |
:arrow_down: |
| cmd/verification_builder.go | 0.00% <0.00%> (ø) |
|
| engine/access/rest/router.go | 100.00% <ø> (ø) |
|
| fvm/environment/env.go | 100.00% <ø> (ø) |
|
| ... and 85 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@zhangchiqing
Instead of using bool flag to indicate v5, what do you think about using int flag for checkpoint version and set default to MaxVersion like this:
https://github.com/onflow/flow-go/blob/124f1c581388b9da2783a391a0aa1d46ef505c16/ledger/complete/wal/checkpointer.go#L61
@zhangchiqing
We also need to modify RemoveCheckpoint to remove all checkpoint files for v6, because v6 generates multiple files.
https://github.com/onflow/flow-go/blob/124f1c581388b9da2783a391a0aa1d46ef505c16/ledger/complete/wal/checkpointer.go#L632-L634
Maybe update PR text with latest benchmark results you shared on Friday. E.g. this PR:
I will do one more round test using the latest data snapshot to update with a more accurate data.
bors merge
Build succeeded:
- Integration Tests (make -C integration access-tests)
- Integration Tests (make -C integration bft-tests)
- Integration Tests (make -C integration collection-tests)
- Integration Tests (make -C integration consensus-tests)
- Integration Tests (make -C integration epochs-tests)
- Integration Tests (make -C integration execution-tests)
- Integration Tests (make -C integration ghost-tests)
- Integration Tests (make -C integration mvp-tests)
- Integration Tests (make -C integration network-tests)
- Integration Tests (make -C integration verification-tests)
- Lint (./)
- Lint (./crypto/)
- Lint (./integration/)
- Unit Tests (access)
- Unit Tests (admin)
- Unit Tests (cmd)
- Unit Tests (consensus)
- Unit Tests (engine)
- Unit Tests (fvm)
- Unit Tests (ledger)
- Unit Tests (module)
- Unit Tests (network)
- Unit Tests (others)
- Unit Tests (utils)