flow-go
flow-go copied to clipboard
Block computation result upload Retry implementation
#2273 - This PR is supposed to address the problem described in this issue, that sometimes ComputationResult upload failed but was not retried so it caused data loss in GCP.
Design
According to the issue #2273, this problem usually occurred at abrupt crash or reboot of EN process before all computation result upload succeeded. To address this, we will need to store the upload status and let EN pick up for retry at boot. The basic idea of this PR is simple:
- After after one ComputationResult is ready, we store one entry to mark the upload status as
false
with given ExecutionDataID into local BadgerDB - When upload is completed successfully, the corresponding item in BadgerDB will be replaced with
true
- If upload fails for certain ComputationResult items, EN process will pick them up and retry upload at reboot.
Metrics
Two new metrics are added in to monitor upload behavior:
- Count of successful upload
- Count of upload retry
Examples of these metrics of different scenarios can be found in the Test Plan section below.
Notes
-
ComputationResult reconstruction from EDS and BadgerDB We don't directly store whole ComputationResult instance into BadgerDB, because that'll take too much space. Actually all required fields in ComputationResult (the ones in
BlockData
) have already stored in EDS and BadgerDB, so we grabEvents
andTrieUpdates
from EDS, and all other fields from BadgerDB to reconstruct ComputationResult for reload. -
Storage overhead We don't expect it will take too much extra storage space in BadgerDB, since it will only be an extra boolean value per ComputationResult. We still keep these markers even after one ComputationResult upload is done for potential future devops uses.
-
AWS uploader Confirmed that we now only cares about GCP uploader. To support new uploaders we will need to introduce new key code for BadgerDB (more details in the comment of the PR)
-
Duplicated upload Confirmed that duplicated upload of the same ComputationResult record is fine and will not trigger any problems.
Test Plan
-
[X] UT:
go test --tags=relic
on all new unit test cases -
[X] localnet: no stored ComputationResult status in DB to retry -> all upload succeeds
-
[X] localnet: no stored ComputationResult status in DB to retry -> all upload fails
-
[X] localnet: with stored ComputationResult status in DB to retry -> retry fails
-
[X] localnet: with stored ComputationResult status
in DB to retry -> retry succeeds + all upload succeeds
-
[ ] devnet
Codecov Report
Merging #2743 (4ede54c) into master (9784c27) will increase coverage by
0.01%
. The diff coverage is49.20%
.
@@ Coverage Diff @@
## master #2743 +/- ##
==========================================
+ Coverage 54.37% 54.39% +0.01%
==========================================
Files 728 731 +3
Lines 67431 67692 +261
==========================================
+ Hits 36667 36820 +153
- Misses 27706 27799 +93
- Partials 3058 3073 +15
Flag | Coverage Δ | |
---|---|---|
unittests | 54.39% <49.20%> (+0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
cmd/execution_builder.go | 0.00% <0.00%> (ø) |
|
engine/execution/computation/manager.go | 79.89% <0.00%> (-0.84%) |
:arrow_down: |
storage/badger/operation/prefix.go | 79.31% <ø> (ø) |
|
...on/computer/uploader/retryable_uploader_wrapper.go | 58.22% <58.22%> (ø) |
|
engine/execution/ingestion/engine.go | 52.32% <63.63%> (+0.14%) |
:arrow_up: |
storage/badger/operation/computation_result.go | 85.18% <85.18%> (ø) |
|
...xecution/computation/computer/uploader/uploader.go | 82.92% <100.00%> (+2.92%) |
:arrow_up: |
storage/badger/computation_result.go | 100.00% <100.00%> (ø) |
|
consensus/hotstuff/eventloop/event_loop.go | 74.82% <0.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Update: did a quick offline review with @m4ksio . Things to refine: instead of storing ComputationResult into BadgerDB, which may introduce extra storage overhead(could be hundreds of GB), we can simply utilize data we already stored in EDS to re-construct ComputationResult at upload retry, and only store upload status flag in BadgerDB.
FVM Benchstat comparison
This branch with compared with the base branch onflow:master commit 40c77f2b31b58d03091c4227daa06a4ca51b2845
The command (for i in {1..10}; do go test ./fvm ./engine/execution/computation --bench . --tags relic -shuffle=on --benchmem --run ^$; done)
was used.
Collapsed results for better readability
old.txt | new.txt | |||
---|---|---|---|---|
time/op | delta | |||
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeNFTBatchTransfer-2 | 113ms ±13% | 109ms ± 3% | ~ | (p=0.133 n=10+9) |
RuntimeTransaction/reference_tx-2 | 26.3ms ± 6% | 25.7ms ±10% | ~ | (p=0.218 n=10+10) |
RuntimeTransaction/convert_int_to_string-2 | 28.2ms ±15% | 27.7ms ±11% | ~ | (p=0.631 n=10+10) |
RuntimeTransaction/get_signer_address-2 | 27.8ms ±10% | 27.9ms ±18% | ~ | (p=0.853 n=10+10) |
RuntimeTransaction/get_account_and_get_available_balance-2 | 260ms ± 6% | 257ms ± 4% | ~ | (p=0.218 n=10+10) |
RuntimeTransaction/get_account_and_get_storage_used-2 | 32.0ms ±10% | 31.1ms ± 3% | ~ | (p=0.156 n=10+9) |
RuntimeTransaction/get_account_and_get_storage_capacity-2 | 226ms ± 2% | 224ms ± 3% | ~ | (p=0.222 n=9+9) |
RuntimeTransaction/get_signer_vault-2 | 34.9ms ± 8% | 34.6ms ± 8% | ~ | (p=0.971 n=10+10) |
RuntimeTransaction/get_signer_receiver-2 | 44.9ms ± 7% | 45.5ms ± 5% | ~ | (p=0.436 n=10+10) |
RuntimeTransaction/transfer_tokens-2 | 208ms ± 3% | 206ms ± 1% | ~ | (p=0.113 n=10+9) |
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2 | 34.1ms ± 8% | 33.4ms ± 8% | ~ | (p=0.393 n=10+10) |
RuntimeTransaction/load_and_save_long_string_on_signers_address-2 | 75.3ms ± 4% | 75.3ms ± 4% | ~ | (p=0.971 n=10+10) |
RuntimeTransaction/create_new_account-2 | 818ms ± 2% | 809ms ± 2% | ~ | (p=0.075 n=10+10) |
RuntimeTransaction/call_empty_contract_function-2 | 29.8ms ± 8% | 29.8ms ± 8% | ~ | (p=0.971 n=10+10) |
RuntimeTransaction/emit_event-2 | 43.2ms ± 6% | 43.1ms ± 6% | ~ | (p=0.912 n=10+10) |
RuntimeTransaction/borrow_array_from_storage-2 | 131ms ± 2% | 130ms ± 2% | ~ | (p=0.315 n=9+10) |
RuntimeTransaction/copy_array_from_storage-2 | 134ms ± 7% | 132ms ± 4% | ~ | (p=0.436 n=10+10) |
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64 | ||||
ComputeBlock/16/cols/128/txes-2 | 4.77s ± 4% | 4.70s ± 3% | ~ | (p=0.247 n=10+10) |
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/get_public_account-2 | 29.8ms ± 8% | 28.7ms ± 5% | −3.54% | (p=0.036 n=9+8) |
RuntimeTransaction/get_account_and_get_balance-2 | 288ms ± 7% | 277ms ± 2% | −3.94% | (p=0.004 n=9+9) |
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2 | 30.1ms ±10% | 28.3ms ± 5% | −6.20% | (p=0.008 n=10+9) |
alloc/op | delta | |||
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/borrow_array_from_storage-2 | 68.8MB ± 2% | 69.9MB ± 1% | +1.68% | (p=0.033 n=10+7) |
RuntimeNFTBatchTransfer-2 | 56.7MB ± 4% | 56.7MB ± 4% | ~ | (p=0.853 n=10+10) |
RuntimeTransaction/convert_int_to_string-2 | 37.1MB ± 8% | 36.8MB ± 5% | ~ | (p=0.905 n=10+9) |
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2 | 37.3MB ± 7% | 36.3MB ± 5% | ~ | (p=0.218 n=10+10) |
RuntimeTransaction/get_signer_address-2 | 36.6MB ± 6% | 36.7MB ± 9% | ~ | (p=0.796 n=10+10) |
RuntimeTransaction/get_public_account-2 | 37.9MB ± 7% | 37.5MB ± 7% | ~ | (p=0.579 n=10+10) |
RuntimeTransaction/get_account_and_get_balance-2 | 130MB ± 2% | 131MB ± 1% | ~ | (p=0.515 n=10+8) |
RuntimeTransaction/get_account_and_get_available_balance-2 | 112MB ± 3% | 111MB ± 4% | ~ | (p=0.393 n=10+10) |
RuntimeTransaction/get_account_and_get_storage_used-2 | 37.6MB ± 8% | 37.4MB ± 2% | ~ | (p=0.497 n=10+9) |
RuntimeTransaction/get_account_and_get_storage_capacity-2 | 107MB ± 2% | 105MB ± 5% | ~ | (p=0.190 n=10+10) |
RuntimeTransaction/get_signer_vault-2 | 38.2MB ± 7% | 38.0MB ± 7% | ~ | (p=0.912 n=10+10) |
RuntimeTransaction/get_signer_receiver-2 | 41.2MB ± 4% | 42.2MB ± 5% | ~ | (p=0.182 n=9+10) |
RuntimeTransaction/transfer_tokens-2 | 90.5MB ± 3% | 90.4MB ± 2% | ~ | (p=0.684 n=10+10) |
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2 | 37.9MB ± 8% | 37.3MB ±10% | ~ | (p=0.436 n=10+10) |
RuntimeTransaction/load_and_save_long_string_on_signers_address-2 | 57.4MB ± 4% | 57.4MB ± 3% | ~ | (p=0.912 n=10+10) |
RuntimeTransaction/create_new_account-2 | 204MB ± 2% | 204MB ± 0% | ~ | (p=0.792 n=10+6) |
RuntimeTransaction/call_empty_contract_function-2 | 37.7MB ± 8% | 37.9MB ± 5% | ~ | (p=0.912 n=10+10) |
RuntimeTransaction/emit_event-2 | 41.5MB ± 6% | 41.3MB ± 6% | ~ | (p=0.912 n=10+10) |
RuntimeTransaction/copy_array_from_storage-2 | 81.8MB ± 7% | 82.9MB ± 2% | ~ | (p=0.447 n=10+9) |
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64 | ||||
ComputeBlock/16/cols/128/txes-2 | 1.32GB ± 1% | 1.32GB ± 1% | ~ | (p=0.481 n=10+10) |
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/reference_tx-2 | 37.4MB ± 8% | 36.1MB ± 6% | −3.61% | (p=0.029 n=10+10) |
allocs/op | delta | |||
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/emit_event-2 | 142k ± 0% | 142k ± 0% | +0.01% | (p=0.043 n=10+10) |
RuntimeNFTBatchTransfer-2 | 295k ± 0% | 295k ± 0% | ~ | (p=0.114 n=10+7) |
RuntimeTransaction/convert_int_to_string-2 | 94.5k ± 0% | 94.5k ± 0% | ~ | (p=0.724 n=10+10) |
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2 | 109k ± 0% | 109k ± 0% | ~ | (p=0.148 n=10+10) |
RuntimeTransaction/get_signer_address-2 | 85.5k ± 0% | 85.5k ± 0% | ~ | (p=0.218 n=10+9) |
RuntimeTransaction/get_public_account-2 | 109k ± 0% | 109k ± 0% | ~ | (p=0.170 n=10+10) |
RuntimeTransaction/get_account_and_get_balance-2 | 1.55M ± 0% | 1.55M ± 0% | ~ | (p=0.565 n=10+10) |
RuntimeTransaction/get_account_and_get_available_balance-2 | 1.43M ± 0% | 1.43M ± 0% | ~ | (p=0.123 n=10+10) |
RuntimeTransaction/get_account_and_get_storage_used-2 | 130k ± 0% | 130k ± 0% | ~ | (p=0.203 n=10+9) |
RuntimeTransaction/get_account_and_get_storage_capacity-2 | 1.27M ± 0% | 1.27M ± 0% | ~ | (p=0.093 n=10+10) |
RuntimeTransaction/get_signer_vault-2 | 132k ± 0% | 132k ± 0% | ~ | (p=0.809 n=10+10) |
RuntimeTransaction/get_signer_receiver-2 | 213k ± 0% | 213k ± 0% | ~ | (p=0.156 n=10+10) |
RuntimeTransaction/transfer_tokens-2 | 962k ± 0% | 962k ± 0% | ~ | (p=0.098 n=9+10) |
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2 | 131k ± 0% | 131k ± 0% | ~ | (p=0.424 n=10+10) |
RuntimeTransaction/load_and_save_long_string_on_signers_address-2 | 233k ± 0% | 233k ± 0% | ~ | (p=0.617 n=10+10) |
RuntimeTransaction/create_new_account-2 | 2.72M ± 0% | 2.72M ± 0% | ~ | (p=0.971 n=10+10) |
RuntimeTransaction/call_empty_contract_function-2 | 97.2k ± 0% | 97.2k ± 0% | ~ | (p=0.493 n=10+10) |
RuntimeTransaction/borrow_array_from_storage-2 | 370k ± 0% | 370k ± 0% | ~ | (p=0.644 n=10+10) |
RuntimeTransaction/copy_array_from_storage-2 | 326k ± 0% | 326k ± 0% | ~ | (p=1.000 n=10+10) |
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64 | ||||
ComputeBlock/16/cols/128/txes-2 | 21.0M ± 0% | 21.0M ± 0% | ~ | (p=0.739 n=10+10) |
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/reference_tx-2 | 80.3k ± 0% | 80.3k ± 0% | −0.01% | (p=0.050 n=10+10) |
computation | delta | |||
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/reference_tx-2 | 202 ± 0% | 202 ± 0% | ~ | (all equal) |
RuntimeTransaction/convert_int_to_string-2 | 402 ± 0% | 402 ± 0% | ~ | (all equal) |
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2 | 502 ± 0% | 502 ± 0% | ~ | (all equal) |
RuntimeTransaction/get_signer_address-2 | 302 ± 0% | 302 ± 0% | ~ | (all equal) |
RuntimeTransaction/get_public_account-2 | 402 ± 0% | 402 ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_balance-2 | 1.00k ± 0% | 1.00k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_available_balance-2 | 2.60k ± 0% | 2.60k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_storage_used-2 | 402 ± 0% | 402 ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_storage_capacity-2 | 1.30k ± 0% | 1.30k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_signer_vault-2 | 402 ± 0% | 402 ± 0% | ~ | (all equal) |
RuntimeTransaction/get_signer_receiver-2 | 602 ± 0% | 602 ± 0% | ~ | (all equal) |
RuntimeTransaction/transfer_tokens-2 | 3.50k ± 0% | 3.50k ± 0% | ~ | (all equal) |
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2 | 602 ± 0% | 602 ± 0% | ~ | (all equal) |
RuntimeTransaction/load_and_save_long_string_on_signers_address-2 | 602 ± 0% | 602 ± 0% | ~ | (all equal) |
RuntimeTransaction/create_new_account-2 | 202 ± 0% | 202 ± 0% | ~ | (all equal) |
RuntimeTransaction/call_empty_contract_function-2 | 402 ± 0% | 402 ± 0% | ~ | (all equal) |
RuntimeTransaction/emit_event-2 | 602 ± 0% | 602 ± 0% | ~ | (all equal) |
RuntimeTransaction/borrow_array_from_storage-2 | 2.60k ± 0% | 2.60k ± 0% | ~ | (all equal) |
RuntimeTransaction/copy_array_from_storage-2 | 2.60k ± 0% | 2.60k ± 0% | ~ | (all equal) |
interactions | delta | |||
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64 | ||||
RuntimeTransaction/reference_tx-2 | 44.4k ± 0% | 44.4k ± 0% | ~ | (all equal) |
RuntimeTransaction/convert_int_to_string-2 | 44.4k ± 0% | 44.4k ± 0% | ~ | (all equal) |
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2 | 44.4k ± 0% | 44.4k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_signer_address-2 | 44.4k ± 0% | 44.4k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_public_account-2 | 44.4k ± 0% | 44.4k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_balance-2 | 16.8M ± 0% | 16.8M ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_available_balance-2 | 5.28M ± 0% | 5.28M ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_storage_used-2 | 48.0k ± 0% | 48.0k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_account_and_get_storage_capacity-2 | 5.27M ± 0% | 5.27M ± 0% | ~ | (all equal) |
RuntimeTransaction/get_signer_vault-2 | 44.7k ± 0% | 44.7k ± 0% | ~ | (all equal) |
RuntimeTransaction/get_signer_receiver-2 | 45.0k ± 0% | 45.0k ± 0% | ~ | (all equal) |
RuntimeTransaction/transfer_tokens-2 | 45.8k ± 0% | 45.8k ± 0% | ~ | (all equal) |
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2 | 44.8k ± 0% | 44.8k ± 0% | ~ | (all equal) |
RuntimeTransaction/load_and_save_long_string_on_signers_address-2 | 49.7k ± 0% | 49.7k ± 0% | ~ | (all equal) |
RuntimeTransaction/create_new_account-2 | 8.53M ± 0% | 8.53M ± 0% | ~ | (all equal) |
RuntimeTransaction/call_empty_contract_function-2 | 44.6k ± 0% | 44.6k ± 0% | ~ | (all equal) |
RuntimeTransaction/emit_event-2 | 44.6k ± 0% | 44.6k ± 0% | ~ | (all equal) |
RuntimeTransaction/borrow_array_from_storage-2 | 49.8k ± 0% | 49.8k ± 0% | ~ | (all equal) |
RuntimeTransaction/copy_array_from_storage-2 | 49.8k ± 0% | 49.8k ± 0% | ~ | (all equal) |
us/tx | delta | |||
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64 | ||||
ComputeBlock/16/cols/128/txes-2 | 2.33k ± 4% | 2.30k ± 3% | ~ | (p=0.225 n=10+10) |
Build succeeded:
- Integration Tests (make -C integration access-tests)
- Integration Tests (make -C integration bft-tests)
- Integration Tests (make -C integration collection-tests)
- Integration Tests (make -C integration consensus-tests)
- Integration Tests (make -C integration epochs-tests)
- Integration Tests (make -C integration execution-tests)
- Integration Tests (make -C integration ghost-tests)
- Integration Tests (make -C integration mvp-tests)
- Integration Tests (make -C integration network-tests)
- Integration Tests (make -C integration verification-tests)
- Lint (./)
- Lint (./crypto/)
- Lint (./integration/)
- Unit Tests (access)
- Unit Tests (admin)
- Unit Tests (cmd)
- Unit Tests (consensus)
- Unit Tests (engine)
- Unit Tests (fvm)
- Unit Tests (ledger)
- Unit Tests (module)
- Unit Tests (network)
- Unit Tests (others)
- Unit Tests (utils)