ckb-vm
ckb-vm copied to clipboard
Enhance performance by changing trace cache strategy
In ckb-vm, the algorithm in trace cache is like this:
pub fn calculate_slot(addr: u64) -> usize {
(addr as usize >> 5) & (8196-1)
}
We call 5 as shift amount below.
The cache missing is very high for computation heavy code. Here is the statistics for bn128-example: https://github.com/cryptape/rvv-prototype/tree/ae405442b477a972e022016c2d791733c788cca7
cd bn128-example
make bench
For different shift amount, we got the following data:
when shift amount is 2:
Use RVV: 0m3.444s
Use IMC: 0m5.172s
when shift amount is 3:
Use RVV: 0m4.095s
Use IMC: 0m5.081s
when shift amount is 4:
Use RVV: 0m4.958s
Use IMC 0m5.798s
when shift amount is 5:
Use RVV: 0m6.084s
Use IMC: 0m7.195s
There are too many small trace fragments (< 8 instructions) in both RVV and IMC cases. It make cache missing very high. When shift amount is reduced to 2, every single instruction own one slot and the cache missing is dramatically reduced.
I found that it also works with secp256k1:
shift = 5
interpret secp256k1_bench via assembly
time: [6.4833 ms 6.5052 ms 6.5356 ms]
shift = 4
interpret secp256k1_bench via assembly mop
time: [6.0247 ms 6.0508 ms 6.0830 ms]
shift = 3
interpret secp256k1_bench via assembly
time: [4.9483 ms 4.9622 ms 4.9799 ms]
In the end we adopted shift = 2, see https://github.com/nervosnetwork/ckb-vm/commit/d6be30f410913e88d8a5890f636b0c4378931a52