flow-go icon indicating copy to clipboard operation
flow-go copied to clipboard

Checkpointing V6 - support concurrent checkpoint encoding/decoding

Open zhangchiqing opened this issue 3 years ago • 3 comments

Closes #3075

This PR implements the checkpointing V6.

Checkpointing Version 6 splits the single checkpoint file into 18 files in total. The main benefits are:

  • The benefit of splitting the checkpoint file is to support concurrent writes to multiple sub files which speeds up checkpoint generation, and concurrent reads which speeds up reading checkpoint.
  • V6 is benefited from V5, where it builds the sub trees first to be encoded, which built the ground for allowing concurrent processing.

See complete design in this doc: https://www.notion.so/dapperlabs/Checkpoint-V6-8c7b97937da54c5b9e6c18b5b4598f2e

Comparison between V5 and V6 using latest mainnet19 data snapshot:

  • checkpoint writing is reduced from 16mins to 3mins, 5.3 times faster
  • checkpoint reading is reduced from 12mins to 2mins, 6 times faster

This is a feature branch. There are more TODO items to be done in separate PRs. Once this PR is approved, I will close this PR until all TODO items are done, then I will re-open this PR, and merge to master.

zhangchiqing avatar Sep 23 '22 21:09 zhangchiqing

Codecov Report

Merging #3273 (40e1658) into master (54840e4) will increase coverage by 0.31%. The diff coverage is 63.68%.

@@            Coverage Diff             @@
##           master    #3273      +/-   ##
==========================================
+ Coverage   55.28%   55.59%   +0.31%     
==========================================
  Files         744      754      +10     
  Lines       67865    69509    +1644     
==========================================
+ Hits        37521    38646    +1125     
- Misses      27293    27683     +390     
- Partials     3051     3180     +129     
Flag Coverage Δ
unittests 55.59% <63.68%> (+0.31%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
admin/commands/storage/helper.go 78.00% <ø> (ø)
admin/commands/storage/read_results.go 72.60% <0.00%> (ø)
admin/commands/storage/read_seals.go 22.32% <0.00%> (ø)
admin/errors.go 0.00% <0.00%> (ø)
cmd/execution_builder.go 0.00% <0.00%> (ø)
cmd/execution_config.go 0.00% <0.00%> (ø)
cmd/scaffold.go 18.05% <0.00%> (-0.10%) :arrow_down:
cmd/verification_builder.go 0.00% <0.00%> (ø)
engine/access/rest/router.go 100.00% <ø> (ø)
fvm/environment/env.go 100.00% <ø> (ø)
... and 85 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Sep 23 '22 21:09 codecov-commenter

@zhangchiqing

Instead of using bool flag to indicate v5, what do you think about using int flag for checkpoint version and set default to MaxVersion like this:
https://github.com/onflow/flow-go/blob/124f1c581388b9da2783a391a0aa1d46ef505c16/ledger/complete/wal/checkpointer.go#L61

fxamacker avatar Oct 13 '22 20:10 fxamacker

@zhangchiqing

We also need to modify RemoveCheckpoint to remove all checkpoint files for v6, because v6 generates multiple files.

https://github.com/onflow/flow-go/blob/124f1c581388b9da2783a391a0aa1d46ef505c16/ledger/complete/wal/checkpointer.go#L632-L634

fxamacker avatar Oct 13 '22 20:10 fxamacker

Maybe update PR text with latest benchmark results you shared on Friday. E.g. this PR:

I will do one more round test using the latest data snapshot to update with a more accurate data.

zhangchiqing avatar Oct 17 '22 15:10 zhangchiqing

bors merge

zhangchiqing avatar Oct 17 '22 15:10 zhangchiqing