ckb-vm icon indicating copy to clipboard operation
ckb-vm copied to clipboard

Enhance performance by changing trace cache strategy

Open XuJiandong opened this issue 3 years ago • 1 comments

In ckb-vm, the algorithm in trace cache is like this:

pub fn calculate_slot(addr: u64) -> usize {
    (addr as usize >> 5) & (8196-1)
}

We call 5 as shift amount below.

The cache missing is very high for computation heavy code. Here is the statistics for bn128-example: https://github.com/cryptape/rvv-prototype/tree/ae405442b477a972e022016c2d791733c788cca7

cd bn128-example
make bench

For different shift amount, we got the following data:

when shift amount is 2:
Use RVV:    0m3.444s
Use IMC:    0m5.172s

when shift amount is 3:
Use RVV:    0m4.095s
Use IMC:   0m5.081s

when shift amount is 4:
Use RVV:    0m4.958s
Use IMC   0m5.798s

when shift amount is 5:
Use RVV:    0m6.084s
Use IMC:    0m7.195s

There are too many small trace fragments (< 8 instructions) in both RVV and IMC cases. It make cache missing very high. When shift amount is reduced to 2, every single instruction own one slot and the cache missing is dramatically reduced.

XuJiandong avatar Mar 18 '22 07:03 XuJiandong

I found that it also works with secp256k1:

shift = 5

interpret secp256k1_bench via assembly                                                                            
                        time:   [6.4833 ms 6.5052 ms 6.5356 ms]

shift = 4

interpret secp256k1_bench via assembly mop                                                                            
                        time:   [6.0247 ms 6.0508 ms 6.0830 ms]

shift = 3

interpret secp256k1_bench via assembly                                                                             
                        time:   [4.9483 ms 4.9622 ms 4.9799 ms]

mohanson avatar Mar 18 '22 07:03 mohanson