evmone icon indicating copy to clipboard operation
evmone copied to clipboard

drag races between geth, aleth and evmone

Open gcolvin opened this issue 5 years ago • 3 comments

I've been cleaning up my old https://github.com/gcolvin/evm-drag-race repo, and have the first set of results ready, which are performance measurements for the arithmetic operations, constrained so that operands are 64, 128, or 256 bits long. From some simple analysis of these measurements it's clear that evmone is a large improvement over its predecessors, that geth has improved over the last year and and a half, and that some opcodes are still mispriced.

The tests (except exp) do a million loops over blocks of about three hundred opcodes, mostly consisting of dup2 <op> pairs for each tested op, with occasional breaks to reset the stack. They are sensitive to the quality of the bigint libraries, and with very long basic blocks they benefit from pulling the gas calculations to block boundaries.

Here are the raw numbers for gas and time in seconds, parsed from the VM output.

(sec/test) gas geth aleth evmone
nop 361000061 3.021555105 2.129643 0.746905
pop 745000061 4.457840038 2.311975 0.632604
add64 873000061 8.044817514 2.911097 0.921461
add128 873000061 8.623307368 3.592689 0.984146
add256 873000061 8.300209998 5.320576 0.931194
sub64 873000061 7.849262534 3.009673 1.171843
sub128 873000061 8.815820885 3.855291 1.171966
sub256 873000061 8.591228095 5.137147 1.171641
mul64 1129000061 8.209472574 2.612447 1.501970
mul128 1129000061 8.38368307 2.818472 1.500458
mul256 1129000061 20.599212858 7.086396 1.502394
div64 1129000061 9.658561249 5.953036 6.957576
div128 1129000061 12.370122234 10.857481 6.652817
div256 1129000061 36.699296987 18.487051 7.299538
exp 1281870061 130.965289511 45.846095 8.183632

Attempting to correct interpreter overhead proves to be fraught with peril. So below I report the total time to execute a single operation including the interpreter overhead, as reported by the VMs themselves. The nop test uses blocks of jumpdest jumpdest, and the pop test uses blocks of dup2 pop. They can be helpful in estimating interpreter overhead, being very little but overhead, but understanding the interpreter code helps.

Also for comparison, mul64c.c does the same calculation with blocks of x *= y. Unoptimized (gcc -O0) it runs at 0.27 ns/op, and fully optimized (gcc -O3) at 0.0016 ns/OP. That sort of sets a bound on how fast a VM could be.

(ns/OP) geth aleth evmone C — C opt
nop 9.33 6.57 2.31
pop 13.76 7.14 1.95
add64 24.83 8.98 2.84
add128 26.62 11.09 3.04
add256 25.62 16.42 2.87
sub64 24.23 9.29 3.62
sub128 27.21 11.90 3.62
sub256 26.52 15.86 3.62
mul64 25.34 8.06 4.64 0.27 — 0.0016
mul128 25.88 8.70 4.63
mul256 63.58 21.87 4.64
div64 29.81 18.37 21.47
div128 38.18 33.51 20.53
div256 113.27 57.06 22.53
exp 13473.80 4716.68 841.94

Finally we have the time for each arithmetic operation normalized to nanoseconds per unit of gas. If every opcode were perfectly priced the values for each column of tests for a VM would be the same. This isn't the case, especially for division and exponentiation.

(ns/gas) gas geth aleth evmone
add64 128000000 28.02 4.68 2.26
add128 128000000 32.54 10.01 2.75
add256 128000000 30.02 23.50 2.33
sub64 128000000 26.50 5.45 4.21
sub128 128000000 34.05 12.06 4.21
sub256 128000000 32.29 22.07 4.21
mul64 384000000 9.77 0.78 2.26
mul128 384000000 10.22 1.32 2.26
mul256 384000000 42.03 12.43 2.27
div64 384000000 13.54 9.48 16.47
div128 384000000 20.60 22.25 15.68
div256 384000000 83.96 42.12 17.36
exp 536870000 235.64 81.09 14.06

I still have several small Solidity programs to test. The programs wouldn't compile with version 5, and I broke some of them getting them to compile.

gcolvin avatar Jun 11 '19 08:06 gcolvin

@chfast @cdetrio @holiman @axic You might find these interesting.

gcolvin avatar Jun 11 '19 08:06 gcolvin

I do notice that the numbers in nanoseconds are unreasonably low, though in relative order. I suspect I am over-correcting for interpreter overhead. Also, the formula I'm using is numerically unstable, and I haven't worked out a better one yet.

gcolvin avatar Jun 12 '19 02:06 gcolvin

I've edited the results to stop trying to correct for overhead, am happy with the formulas, and have tested the C version of mul64.

gcolvin avatar Jun 16 '19 07:06 gcolvin