gear
gear copied to clipboard
program execution time may exceed block time limit
Problem
Currently we use gas costs and instructions weights to limit program execution time. Weights are calculated automatically using substrate benchmarks
framework. But this method cannot guarantees that program won't exceed allowed time for execution in all cases.
As an example, you can try this wasm code exceed_time.wasm.zip
Reproducing context:
Profile: production
Native runtime: gear-node-1660 (gear-node-1.tx1.au1)
š» Operating system: linux
š» CPU architecture: x86_64
š» Target environment: gnu
š» CPU: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
š» CPU cores: 14
š» Memory: 64202MB
š» Kernel: 5.4.0-122-generic
š» Linux distribution: Ubuntu 18.04.6 LTS
š» Virtual machine: no
Node run cmd:
./target/production/gear-node --dev --execution=wasm
So, if you try to execute wasm
above you will find out this:
10:23:36.001 DEBUG tokio-runtime-worker runtime::gear: āļø Initialization of block #7
10:23:36.025 DEBUG tokio-runtime-worker runtime::gear: āļø Queue and tasks processing of block #7 with weight='303159757333'
10:23:36.026 DEBUG tokio-runtime-worker gear_core_processor::processor: Charged for memory pages. Size: WasmPageNumber(512)
10:23:36.026 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing program 0x5b06327a320bdbb5c9e039c35001a2a416e2e6a125c51137dea51805911ebda1
10:23:36.026 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing dispatch IncomingDispatch { kind: Init, message: IncomingMessage { id: 0x366a601f2
10:23:36.667 INFO tokio-runtime-worker aura: āļø Discarding proposal for slot 1661768616; block production took too long
10:23:36.827 DEBUG tokio-runtime-worker gear_core_processor::executor: Termination reason: Success
10:23:36.899 DEBUG tokio-runtime-worker runtime::gear: āļø Weight '249986676015' burned in block #7
10:23:36.899 DEBUG tokio-runtime-worker runtime::gear: āļø Finalization of block #7
10:23:37.001 INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: š Starting consensus session on top of parent 0x6adad559aa863162c11cbbf21d1c498dedf
10:23:37.001 DEBUG tokio-runtime-worker runtime::gear: āļø Initialization of block #7
10:23:37.020 INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: š Prepared block for proposing at 7 (1018 ms) [hash: 0x9adc1a89099c5bf7b6dbd6286d92
10:23:37.028 DEBUG tokio-runtime-worker runtime::gear: āļø Queue and tasks processing of block #7 with weight='303159757333'
10:23:37.029 DEBUG tokio-runtime-worker gear_core_processor::processor: Charged for memory pages. Size: WasmPageNumber(512)
10:23:37.029 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing program 0x5b06327a320bdbb5c9e039c35001a2a416e2e6a125c51137dea51805911ebda1
10:23:37.029 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing dispatch IncomingDispatch { kind: Init, message: IncomingMessage { id: 0x366a601f2
10:23:37.840 DEBUG tokio-runtime-worker gear_core_processor::executor: Termination reason: Success
10:23:37.905 DEBUG tokio-runtime-worker runtime::gear: āļø Weight '249986676015' burned in block #7
10:23:37.905 DEBUG tokio-runtime-worker runtime::gear: āļø Finalization of block #7
10:23:38.013 INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: š Prepared block for proposing at 7 (1010 ms) [hash: 0xae2603d3c1218a976113e663e3e1
10:23:38.018 INFO tokio-runtime-worker aura: š Pre-sealed block for proposal at 7. Hash now 0x2b81ba807af31953d7f4dff3b7d1c3017235d927d12d522c8e7d50c65465808b, p
10:23:38.251 INFO tokio-runtime-worker substrate: āØ Imported #7 (0x2b81ā¦808b)
As you can see block #7
initialisation starts twice because of this:
aura: āļø Discarding proposal for slot 1661768616; block production took too long
Actually this message comes because of this part of substrate
code:
https://github.com/gear-tech/substrate/blob/abc134523a276ba867ec4ebd10dc16069401a9ed/client/consensus/slots/src/lib.rs#L295-L302
Currently this problem leads to the fact that block is imported only after 2.25 seconds (need to be 1 second currently for master). If you will try to make it not in --dev
but for staging
chain then result will be the same, except reorg will be called:
INFO tokio-runtime-worker sc_informant: ā»ļø Reorg on #145,0x0d6eā¦16f3 to #146,0xab5cā¦d3a1, common ancestor #144,0x3c4aā¦16a2
Let's iterate over the problems which causes time exceed. For each separate issue can be made.
Load/Store instructions weight (the most important)
https://github.com/gear-tech/gear/blob/654a0b8551fa398142c96bc4050aa2567e686b43/pallets/gear/src/benchmarking/mod.rs#L1737-L1747
Looking at benchmarks for loads we can see, that we investigate only loads, that access two first wasm pages. Two wasm pages can be fit inside L1
and L2
caches of processor, so each load will be without L2
cache miss, which means that his execution time is not much bigger than simple add
. In wasm
example above, I have made access to different part of 512-pages wasm memory, so L2
cache misses are appear almost for each load operation, which causes that in fact load
operation takes much more time then it's weight tell us.
Another problem is that in benchmark all loads are without offset. But in wasmer
loads with offsets generates additional checks in code (check that base addr + offset
calcs without overflow), which leads to additional branches in compiled code and also increase execution time.
i64.const instruction weight
Current weight is 30
, which is much smaller than add
for example: 180
. This also caused by not suitable benchmarks:
https://github.com/gear-tech/gear/blob/654a0b8551fa398142c96bc4050aa2567e686b43/pallets/gear/src/benchmarking/mod.rs#L1726-L1729
In fact wasmer singlepass
has optimisation, which generates real const operation only in places where it uses, but because of drop
in benchmarks this operations won't be generated. Also in benchmarks const
instruction has small literal. But in fact operations with large literals will be generated in several host native operations, which also increase their execution time.
wasm stack size (connected issue #1364 )
wasmer
uses register allocation for some cases when stack is not so big. But when stack size become large then several values, it start to use real stack in memory to store there operation execution results. Because currently for most benchmarks stack size is 1-3 values, then we will never meet the case when wasmer
start to use memory stack, so all instruction weights are lighter then may be.
wasmer
compilation time
Currently we take balance for code size
in upload_program
extrinsic call, but we also have to charge gas for code size
in message processing, because bigger code leads to bigger compilation time. Even if we have compiled code in cache, then reading this cache from drive will take time which is proportional to the code size.
charge gas for pages upload
Currently we charge gas for pages load. We charge gas for all pages if it's first program execution in block and charge nothing if it's second or more execution.
https://github.com/gear-tech/gear/blob/654a0b8551fa398142c96bc4050aa2567e686b43/core-processor/src/executor.rs#L87-L98
In fact during second execution a lot of pages can be write-accessed and all this pages must be updated in storage.
So, algorithm must be more flexible. We should charge gas for pages which has been loaded and also charge gas for pages which will be uploaded to storage. We can use information about this pages from lazy-pages
in the end of execution. Before execution we can charge gas for all pages load and upload greedily and then return gas after execution.
relates to #1172
Move to separate issues:
Load/Store instructions weight (the most important)
https://github.com/gear-tech/gear/issues/1728
i64.const instruction weight
https://github.com/gear-tech/gear/issues/1729
wasm stack size
https://github.com/gear-tech/gear/issues/1648 and https://github.com/gear-tech/gear/issues/1364
wasmer compilation time
fixed in https://github.com/gear-tech/gear/commit/e87ab9e4d2709ce497acf186288b4e689fa587a2
charge gas for pages upload
https://github.com/gear-tech/gear/issues/1731