flow-go icon indicating copy to clipboard operation
flow-go copied to clipboard

[Execution] Worker-based Chunk Data Pack Requests

Open yhassanzadeh13 opened this issue 2 years ago • 1 comments

This PR refactors the provider engine of the execution node to perform a worker-based chunk data pack request processing. Prior to this PR, each incoming chunk data pack request would fire up a separate goroutine on the execution node that could lead to resource exhaustion on the execution nodes under the large load of chunk data pack requests.

This PR implements a HeroCache-based queue that queues up the incoming chunk requests at execution nodes to be picked by the concurrent workers. Hence rate-limiting the chunk data pack requests at the execution nodes.

Additionally, this PR fixes false-positive linter errors and re-generates stale mocks.

yhassanzadeh13 avatar Aug 09 '22 19:08 yhassanzadeh13

FVM Benchstat comparison

This branch with compared with the base branch onflow:master commit 9784c27fc3e0a2f7d987f1bc11df68ed4988fc3a

The command (for i in {1..10}; do go test ./fvm ./engine/execution/computation --bench . --tags relic -shuffle=on --benchmem --run ^$; done) was used.

Collapsed results for better readability

old.txtnew.txt
time/opdelta
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/reference_tx-223.9ms ± 6%23.9ms ± 8%~(p=0.863 n=9+9)
RuntimeTransaction/convert_int_to_string-227.2ms ± 5%27.2ms ± 8%~(p=0.853 n=10+10)
RuntimeTransaction/convert_int_to_string_and_concatenate_it-229.8ms ± 5%28.8ms ± 9%~(p=0.063 n=10+10)
RuntimeTransaction/get_signer_address-225.0ms ± 3%25.2ms ± 7%~(p=0.321 n=8+9)
RuntimeTransaction/get_public_account-228.6ms ± 7%28.8ms ± 8%~(p=0.661 n=9+10)
RuntimeTransaction/get_account_and_get_balance-2257ms ± 1%257ms ± 1%~(p=0.971 n=10+10)
RuntimeTransaction/get_account_and_get_available_balance-2235ms ± 2%236ms ± 1%~(p=0.529 n=10+10)
RuntimeTransaction/get_account_and_get_storage_used-230.5ms ± 7%30.4ms ± 7%~(p=0.912 n=10+10)
RuntimeTransaction/get_account_and_get_storage_capacity-2207ms ± 1%207ms ± 2%~(p=0.853 n=10+10)
RuntimeTransaction/get_signer_vault-234.2ms ± 7%34.1ms ± 5%~(p=1.000 n=10+10)
RuntimeTransaction/get_signer_receiver-243.2ms ± 4%43.1ms ± 6%~(p=0.971 n=10+10)
RuntimeTransaction/transfer_tokens-2200ms ± 2%200ms ± 1%~(p=0.780 n=9+10)
RuntimeTransaction/load_and_save_empty_string_on_signers_address-233.1ms ± 7%32.3ms ± 6%~(p=0.218 n=10+10)
RuntimeTransaction/load_and_save_long_string_on_signers_address-275.6ms ± 2%75.5ms ± 4%~(p=0.481 n=10+10)
RuntimeTransaction/call_empty_contract_function-228.3ms ± 6%27.9ms ± 9%~(p=0.353 n=10+10)
RuntimeTransaction/emit_event-243.7ms ± 4%43.0ms ± 8%~(p=0.481 n=10+10)
RuntimeTransaction/borrow_array_from_storage-2125ms ± 2%125ms ± 4%~(p=0.684 n=10+10)
RuntimeTransaction/copy_array_from_storage-2126ms ± 3%126ms ± 5%~(p=1.000 n=10+10)
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64
ComputeBlock/16/cols/128/txes-24.70s ± 2%4.71s ± 2%~(p=0.579 n=10+10)
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/create_new_account-2829ms ± 1%824ms ± 1%−0.58%(p=0.015 n=10+10)
RuntimeNFTBatchTransfer-2115ms ±14%111ms ± 2%−4.03%(p=0.027 n=9+8)
 
computationdelta
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/reference_tx-2202 ± 0%202 ± 0%~(all equal)
RuntimeTransaction/convert_int_to_string-2402 ± 0%402 ± 0%~(all equal)
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2502 ± 0%502 ± 0%~(all equal)
RuntimeTransaction/get_signer_address-2302 ± 0%302 ± 0%~(all equal)
RuntimeTransaction/get_public_account-2402 ± 0%402 ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_balance-21.00k ± 0%1.00k ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_available_balance-22.60k ± 0%2.60k ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_storage_used-2402 ± 0%402 ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_storage_capacity-21.30k ± 0%1.30k ± 0%~(all equal)
RuntimeTransaction/get_signer_vault-2402 ± 0%402 ± 0%~(all equal)
RuntimeTransaction/get_signer_receiver-2602 ± 0%602 ± 0%~(all equal)
RuntimeTransaction/transfer_tokens-23.50k ± 0%3.50k ± 0%~(all equal)
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2602 ± 0%602 ± 0%~(all equal)
RuntimeTransaction/load_and_save_long_string_on_signers_address-2602 ± 0%602 ± 0%~(all equal)
RuntimeTransaction/create_new_account-2202 ± 0%202 ± 0%~(all equal)
RuntimeTransaction/call_empty_contract_function-2402 ± 0%402 ± 0%~(all equal)
RuntimeTransaction/emit_event-2602 ± 0%602 ± 0%~(all equal)
RuntimeTransaction/borrow_array_from_storage-22.60k ± 0%2.60k ± 0%~(all equal)
RuntimeTransaction/copy_array_from_storage-22.60k ± 0%2.60k ± 0%~(all equal)
 
interactionsdelta
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/reference_tx-244.4k ± 0%44.4k ± 0%~(all equal)
RuntimeTransaction/convert_int_to_string-244.4k ± 0%44.4k ± 0%~(all equal)
RuntimeTransaction/convert_int_to_string_and_concatenate_it-244.4k ± 0%44.4k ± 0%~(all equal)
RuntimeTransaction/get_signer_address-244.4k ± 0%44.4k ± 0%~(all equal)
RuntimeTransaction/get_public_account-244.4k ± 0%44.4k ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_balance-216.8M ± 0%16.8M ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_available_balance-25.28M ± 0%5.28M ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_storage_used-248.0k ± 0%48.0k ± 0%~(all equal)
RuntimeTransaction/get_account_and_get_storage_capacity-25.27M ± 0%5.27M ± 0%~(all equal)
RuntimeTransaction/get_signer_vault-244.7k ± 0%44.7k ± 0%~(all equal)
RuntimeTransaction/get_signer_receiver-245.0k ± 0%45.0k ± 0%~(all equal)
RuntimeTransaction/transfer_tokens-245.8k ± 0%45.8k ± 0%~(all equal)
RuntimeTransaction/load_and_save_empty_string_on_signers_address-244.8k ± 0%44.8k ± 0%~(all equal)
RuntimeTransaction/load_and_save_long_string_on_signers_address-249.7k ± 0%49.7k ± 0%~(all equal)
RuntimeTransaction/create_new_account-28.53M ± 0%8.53M ± 0%~(all equal)
RuntimeTransaction/call_empty_contract_function-244.6k ± 0%44.6k ± 0%~(all equal)
RuntimeTransaction/emit_event-244.6k ± 0%44.6k ± 0%~(all equal)
RuntimeTransaction/borrow_array_from_storage-249.8k ± 0%49.8k ± 0%~(all equal)
RuntimeTransaction/copy_array_from_storage-249.8k ± 0%49.8k ± 0%~(all equal)
 
alloc/opdelta
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/get_signer_address-237.1MB ± 5%38.4MB ± 1%+3.54%(p=0.006 n=10+8)
RuntimeTransaction/reference_tx-236.5MB ± 5%37.0MB ± 6%~(p=0.529 n=10+10)
RuntimeTransaction/convert_int_to_string-238.2MB ± 6%38.6MB ± 8%~(p=0.436 n=10+10)
RuntimeTransaction/convert_int_to_string_and_concatenate_it-239.5MB ± 5%38.8MB ± 4%~(p=0.243 n=10+9)
RuntimeTransaction/get_public_account-239.0MB ± 7%39.5MB ± 6%~(p=0.684 n=10+10)
RuntimeTransaction/get_account_and_get_balance-2129MB ± 3%130MB ± 0%~(p=0.071 n=9+7)
RuntimeTransaction/get_account_and_get_available_balance-2112MB ± 3%111MB ± 4%~(p=0.912 n=10+10)
RuntimeTransaction/get_account_and_get_storage_used-239.1MB ± 8%38.5MB ± 5%~(p=0.739 n=10+10)
RuntimeTransaction/get_account_and_get_storage_capacity-2105MB ± 3%108MB ± 0%~(p=0.056 n=10+6)
RuntimeTransaction/get_signer_vault-239.9MB ± 8%40.0MB ± 6%~(p=0.971 n=10+10)
RuntimeTransaction/get_signer_receiver-240.6MB ± 5%40.9MB ± 5%~(p=0.393 n=10+10)
RuntimeTransaction/transfer_tokens-288.2MB ± 3%87.2MB ± 2%~(p=0.182 n=10+9)
RuntimeTransaction/load_and_save_empty_string_on_signers_address-238.9MB ± 8%38.3MB ± 7%~(p=0.481 n=10+10)
RuntimeTransaction/load_and_save_long_string_on_signers_address-257.4MB ± 3%57.2MB ± 3%~(p=0.971 n=10+10)
RuntimeTransaction/create_new_account-2201MB ± 1%199MB ± 0%~(p=0.101 n=10+8)
RuntimeTransaction/call_empty_contract_function-238.3MB ± 6%38.1MB ± 9%~(p=0.853 n=10+10)
RuntimeTransaction/emit_event-243.5MB ± 5%43.2MB ± 8%~(p=0.631 n=10+10)
RuntimeTransaction/borrow_array_from_storage-268.7MB ± 2%67.7MB ± 2%~(p=0.068 n=8+10)
RuntimeTransaction/copy_array_from_storage-283.4MB ± 3%82.8MB ± 7%~(p=0.684 n=10+10)
RuntimeNFTBatchTransfer-258.4MB ± 4%57.5MB ± 4%~(p=0.218 n=10+10)
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64
ComputeBlock/16/cols/128/txes-21.32GB ± 0%1.32GB ± 1%~(p=0.193 n=7+10)
 
allocs/opdelta
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/reference_tx-280.3k ± 0%80.3k ± 0%~(p=0.403 n=10+10)
RuntimeTransaction/convert_int_to_string-294.5k ± 0%94.5k ± 0%~(p=0.507 n=10+9)
RuntimeTransaction/get_signer_address-285.5k ± 0%85.5k ± 0%~(p=0.305 n=10+10)
RuntimeTransaction/get_public_account-2109k ± 0%109k ± 0%~(p=0.838 n=10+10)
RuntimeTransaction/get_account_and_get_balance-21.55M ± 0%1.55M ± 0%~(p=0.109 n=10+10)
RuntimeTransaction/get_account_and_get_available_balance-21.43M ± 0%1.43M ± 0%~(p=0.591 n=10+10)
RuntimeTransaction/get_account_and_get_storage_used-2130k ± 0%130k ± 0%~(p=0.100 n=10+10)
RuntimeTransaction/get_account_and_get_storage_capacity-21.27M ± 0%1.27M ± 0%~(p=0.573 n=10+8)
RuntimeTransaction/get_signer_vault-2132k ± 0%132k ± 0%~(p=0.811 n=10+10)
RuntimeTransaction/get_signer_receiver-2213k ± 0%213k ± 0%~(p=0.697 n=10+10)
RuntimeTransaction/transfer_tokens-2962k ± 0%962k ± 0%~(p=0.926 n=10+10)
RuntimeTransaction/load_and_save_empty_string_on_signers_address-2131k ± 0%131k ± 0%~(p=0.827 n=10+9)
RuntimeTransaction/load_and_save_long_string_on_signers_address-2233k ± 0%233k ± 0%~(p=0.285 n=9+10)
RuntimeTransaction/create_new_account-22.72M ± 0%2.72M ± 0%~(p=0.617 n=10+10)
RuntimeTransaction/call_empty_contract_function-297.2k ± 0%97.2k ± 0%~(p=0.763 n=10+9)
RuntimeTransaction/emit_event-2142k ± 0%142k ± 0%~(p=0.839 n=10+10)
RuntimeTransaction/borrow_array_from_storage-2370k ± 0%369k ± 0%~(p=0.225 n=10+10)
RuntimeTransaction/copy_array_from_storage-2326k ± 0%326k ± 0%~(p=0.644 n=10+10)
RuntimeNFTBatchTransfer-2294k ± 0%295k ± 0%~(p=0.075 n=10+10)
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64
ComputeBlock/16/cols/128/txes-221.0M ± 0%21.0M ± 0%~(p=0.436 n=10+10)
pkg:github.com/onflow/flow-go/fvm goos:linux goarch:amd64
RuntimeTransaction/convert_int_to_string_and_concatenate_it-2109k ± 0%109k ± 0%−0.01%(p=0.019 n=10+10)
 
us/txdelta
pkg:github.com/onflow/flow-go/engine/execution/computation goos:linux goarch:amd64
ComputeBlock/16/cols/128/txes-22.30k ± 2%2.30k ± 2%~(p=0.564 n=10+10)
 

github-actions[bot] avatar Aug 09 '22 20:08 github-actions[bot]

I'm not familiar with this part of the code base. I have a few ignorant questions:

  1. why do you need to implement the queue using herocache? Implementing a queue using cache data structure (map + link list) seems expensive. Why can't this just be a regular go channel (or a slice guarded by mutex)?
  2. why do you need both chdpRequestQueue and chdpRequestChannel? They basically do the same thing?

pattyshack avatar Aug 25 '22 21:08 pattyshack

Patrick has a good point. IIUC this PR has two parts:

  • limited concurrency
  • limited buffering

Both of these can be accomplished by a buffered channel with a pool of worker-goroutines for processing.

SaveTheRbtz avatar Aug 26 '22 05:08 SaveTheRbtz

Patrick has a good point. IIUC this PR has two parts:

  • limited concurrency
  • limited buffering

Both of these can be accomplished by a buffered channel with a pool of worker-goroutines for processing.

A buffered channel does not de-duplicate the repeated chunk data pack requests coming from the same requester for the same chunk. In production, a requester keeps repeating its request till it gets the result. If the requesters are just buffered in a channel, and no deduplication is involved, duplicate requests may be buffered and occupy the entire capacity of the channel while fresh and new requests are discarded. Moreover, it is enough for one copy of a request to buffer as long as it has not yet been processed, and buffering duplicate requests also means a redundant workload for ENs.

HeroCache serves as the point of deduplication by optimizing memory and CPU cost:

  • Compared to other alternatives, e.g., a map, HeroCache has the minimum footprint on the heap and optimizes GC time.
  • HeroCache also computes the unique identifier corresponding to a pair of requester id and chunk id only once, and caches that internally, so any read or write query does not involve re-computing the identifier, which involves an encoding to byte array and taking the hash of the byte representation.

yhassanzadeh13 avatar Aug 26 '22 16:08 yhassanzadeh13

IC. Queue implies the same number of items entering/leaving the collection, hence I assumed the items are unique. Perhaps a different name is more appropriate.

This code doesn't seem to do a very good job at de-duplication. While a worker is processing a chunk, an equivalent chunk can reenter the queue.

You may be better off using a real/normal queue for buffering, then maintain a separate data structure shared by the workers which keeps track of an item's id for a fixed duration time range, starting from the initial processes time, and dedup based on that.

pattyshack avatar Aug 26 '22 23:08 pattyshack

This code doesn't seem to do a very good job at de-duplication. While a worker is processing a chunk, an equivalent chunk can reenter the queue.

You may be better off using a real/normal queue for buffering, then maintain a separate data structure shared by the workers which keeps track of an item's id for a fixed duration time range, starting from the initial processes time, and dedup based on that.

https://github.com/dapperlabs/flow-go/issues/6373

yhassanzadeh13 avatar Aug 29 '22 17:08 yhassanzadeh13

Codecov Report

Merging #2951 (618a83b) into master (9784c27) will increase coverage by 0.10%. The diff coverage is 81.00%.

@@            Coverage Diff             @@
##           master    #2951      +/-   ##
==========================================
+ Coverage   54.37%   54.48%   +0.10%     
==========================================
  Files         728      729       +1     
  Lines       67431    67576     +145     
==========================================
+ Hits        36667    36820     +153     
+ Misses      27706    27691      -15     
- Partials     3058     3065       +7     
Flag Coverage Δ
unittests 54.48% <81.00%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/execution_builder.go 0.00% <0.00%> (ø)
cmd/execution_config.go 0.00% <0.00%> (ø)
engine/execution/state/state.go 21.53% <0.00%> (ø)
module/mempool/queue/chunkDataPack.go 79.36% <79.36%> (ø)
engine/execution/provider/engine.go 72.00% <84.31%> (+26.78%) :arrow_up:
module/mempool/herocache/backdata/cache.go 91.79% <100.00%> (+0.16%) :arrow_up:
module/mempool/herocache/backdata/heropool/pool.go 89.69% <100.00%> (+1.58%) :arrow_up:
module/mempool/epochs/transactions.go 90.32% <0.00%> (-9.68%) :arrow_down:
...s/hotstuff/votecollector/staking_vote_processor.go 83.87% <0.00%> (-3.23%) :arrow_down:
consensus/hotstuff/eventloop/event_loop.go 73.46% <0.00%> (-1.37%) :arrow_down:
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Sep 02 '22 05:09 codecov-commenter