gear program execution time may exceed block time limit

Problem

Currently we use gas costs and instructions weights to limit program execution time. Weights are calculated automatically using substrate benchmarks framework. But this method cannot guarantees that program won't exceed allowed time for execution in all cases.

As an example, you can try this wasm code exceed_time.wasm.zip

Reproducing context:

Profile: production
Native runtime: gear-node-1660 (gear-node-1.tx1.au1)
 💻 Operating system: linux
 💻 CPU architecture: x86_64
 💻 Target environment: gnu
 💻 CPU: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
 💻 CPU cores: 14
 💻 Memory: 64202MB
 💻 Kernel: 5.4.0-122-generic
 💻 Linux distribution: Ubuntu 18.04.6 LTS
 💻 Virtual machine: no

Node run cmd:

./target/production/gear-node --dev --execution=wasm

So, if you try to execute wasm above you will find out this:

10:23:36.001 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Initialization of block #7
10:23:36.025 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Queue and tasks processing of block #7 with weight='303159757333'
10:23:36.026 DEBUG tokio-runtime-worker gear_core_processor::processor: Charged for memory pages. Size: WasmPageNumber(512)
10:23:36.026 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing program 0x5b06327a320bdbb5c9e039c35001a2a416e2e6a125c51137dea51805911ebda1
10:23:36.026 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing dispatch IncomingDispatch { kind: Init, message: IncomingMessage { id: 0x366a601f2
10:23:36.667  INFO tokio-runtime-worker aura: ⌛️ Discarding proposal for slot 1661768616; block production took too long
10:23:36.827 DEBUG tokio-runtime-worker gear_core_processor::executor: Termination reason: Success
10:23:36.899 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Weight '249986676015' burned in block #7
10:23:36.899 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Finalization of block #7
10:23:37.001  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: 🙌 Starting consensus session on top of parent 0x6adad559aa863162c11cbbf21d1c498dedf
10:23:37.001 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Initialization of block #7
10:23:37.020  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: 🎁 Prepared block for proposing at 7 (1018 ms) [hash: 0x9adc1a89099c5bf7b6dbd6286d92
10:23:37.028 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Queue and tasks processing of block #7 with weight='303159757333'
10:23:37.029 DEBUG tokio-runtime-worker gear_core_processor::processor: Charged for memory pages. Size: WasmPageNumber(512)
10:23:37.029 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing program 0x5b06327a320bdbb5c9e039c35001a2a416e2e6a125c51137dea51805911ebda1
10:23:37.029 DEBUG tokio-runtime-worker gear_core_processor::executor: Executing dispatch IncomingDispatch { kind: Init, message: IncomingMessage { id: 0x366a601f2
10:23:37.840 DEBUG tokio-runtime-worker gear_core_processor::executor: Termination reason: Success
10:23:37.905 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Weight '249986676015' burned in block #7
10:23:37.905 DEBUG tokio-runtime-worker runtime::gear: ⚙️ Finalization of block #7
10:23:38.013  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: 🎁 Prepared block for proposing at 7 (1010 ms) [hash: 0xae2603d3c1218a976113e663e3e1
10:23:38.018  INFO tokio-runtime-worker aura: 🔖 Pre-sealed block for proposal at 7. Hash now 0x2b81ba807af31953d7f4dff3b7d1c3017235d927d12d522c8e7d50c65465808b, p
10:23:38.251  INFO tokio-runtime-worker substrate: ✨ Imported #7 (0x2b81…808b)

As you can see block #7 initialisation starts twice because of this:

aura: ⌛️ Discarding proposal for slot 1661768616; block production took too long

Actually this message comes because of this part of substrate code: https://github.com/gear-tech/substrate/blob/abc134523a276ba867ec4ebd10dc16069401a9ed/client/consensus/slots/src/lib.rs#L295-L302

Currently this problem leads to the fact that block is imported only after 2.25 seconds (need to be 1 second currently for master). If you will try to make it not in --dev but for staging chain then result will be the same, except reorg will be called:

INFO tokio-runtime-worker sc_informant: ♻️  Reorg on #145,0x0d6e…16f3 to #146,0xab5c…d3a1, common ancestor #144,0x3c4a…16a2

Let's iterate over the problems which causes time exceed. For each separate issue can be made.

Load/Store instructions weight (the most important)

https://github.com/gear-tech/gear/blob/654a0b8551fa398142c96bc4050aa2567e686b43/pallets/gear/src/benchmarking/mod.rs#L1737-L1747 Looking at benchmarks for loads we can see, that we investigate only loads, that access two first wasm pages. Two wasm pages can be fit inside L1 and L2 caches of processor, so each load will be without L2 cache miss, which means that his execution time is not much bigger than simple add. In wasm example above, I have made access to different part of 512-pages wasm memory, so L2 cache misses are appear almost for each load operation, which causes that in fact load operation takes much more time then it's weight tell us. Another problem is that in benchmark all loads are without offset. But in wasmer loads with offsets generates additional checks in code (check that base addr + offset calcs without overflow), which leads to additional branches in compiled code and also increase execution time.

i64.const instruction weight

Current weight is 30, which is much smaller than add for example: 180. This also caused by not suitable benchmarks: https://github.com/gear-tech/gear/blob/654a0b8551fa398142c96bc4050aa2567e686b43/pallets/gear/src/benchmarking/mod.rs#L1726-L1729 In fact wasmer singlepass has optimisation, which generates real const operation only in places where it uses, but because of drop in benchmarks this operations won't be generated. Also in benchmarks const instruction has small literal. But in fact operations with large literals will be generated in several host native operations, which also increase their execution time.

wasm stack size (connected issue #1364 )

wasmer uses register allocation for some cases when stack is not so big. But when stack size become large then several values, it start to use real stack in memory to store there operation execution results. Because currently for most benchmarks stack size is 1-3 values, then we will never meet the case when wasmer start to use memory stack, so all instruction weights are lighter then may be.

`wasmer` compilation time

Currently we take balance for code size in upload_program extrinsic call, but we also have to charge gas for code size in message processing, because bigger code leads to bigger compilation time. Even if we have compiled code in cache, then reading this cache from drive will take time which is proportional to the code size.

charge gas for pages upload

Currently we charge gas for pages load. We charge gas for all pages if it's first program execution in block and charge nothing if it's second or more execution. https://github.com/gear-tech/gear/blob/654a0b8551fa398142c96bc4050aa2567e686b43/core-processor/src/executor.rs#L87-L98 In fact during second execution a lot of pages can be write-accessed and all this pages must be updated in storage. So, algorithm must be more flexible. We should charge gas for pages which has been loaded and also charge gas for pages which will be uploaded to storage. We can use information about this pages from lazy-pages in the end of execution. Before execution we can charge gas for all pages load and upload greedily and then return gas after execution.

Aug 29 '22 12:08 grishasobol

relates to #1172

Aug 29 '22 13:08 gshep

Move to separate issues:

Load/Store instructions weight (the most important)

https://github.com/gear-tech/gear/issues/1728

i64.const instruction weight

https://github.com/gear-tech/gear/issues/1729

wasm stack size

https://github.com/gear-tech/gear/issues/1648 and https://github.com/gear-tech/gear/issues/1364

wasmer compilation time

fixed in https://github.com/gear-tech/gear/commit/e87ab9e4d2709ce497acf186288b4e689fa587a2

charge gas for pages upload

https://github.com/gear-tech/gear/issues/1731

Oct 27 '22 15:10 grishasobol

gear gear copied to clipboard

program execution time may exceed block time limit

Problem

Load/Store instructions weight (the most important)

i64.const instruction weight

wasm stack size (connected issue #1364 )

wasmer compilation time

charge gas for pages upload

Load/Store instructions weight (the most important)

i64.const instruction weight

wasm stack size

wasmer compilation time

charge gas for pages upload

gear
gear copied to clipboard

`wasmer` compilation time