evmone
evmone copied to clipboard
drag races between geth, aleth and evmone
I've been cleaning up my old https://github.com/gcolvin/evm-drag-race repo, and have the first set of results ready, which are performance measurements for the arithmetic operations, constrained so that operands are 64, 128, or 256 bits long. From some simple analysis of these measurements it's clear that evmone is a large improvement over its predecessors, that geth has improved over the last year and and a half, and that some opcodes are still mispriced.
The tests (except exp) do a million loops over blocks of about three hundred opcodes, mostly consisting of dup2 <op>
pairs for each tested op, with occasional breaks to reset the stack. They are sensitive to the quality of the bigint libraries, and with very long basic blocks they benefit from pulling the gas calculations to block boundaries.
Here are the raw numbers for gas and time in seconds, parsed from the VM output.
(sec/test) | gas | geth | aleth | evmone |
---|---|---|---|---|
nop | 361000061 | 3.021555105 | 2.129643 | 0.746905 |
pop | 745000061 | 4.457840038 | 2.311975 | 0.632604 |
add64 | 873000061 | 8.044817514 | 2.911097 | 0.921461 |
add128 | 873000061 | 8.623307368 | 3.592689 | 0.984146 |
add256 | 873000061 | 8.300209998 | 5.320576 | 0.931194 |
sub64 | 873000061 | 7.849262534 | 3.009673 | 1.171843 |
sub128 | 873000061 | 8.815820885 | 3.855291 | 1.171966 |
sub256 | 873000061 | 8.591228095 | 5.137147 | 1.171641 |
mul64 | 1129000061 | 8.209472574 | 2.612447 | 1.501970 |
mul128 | 1129000061 | 8.38368307 | 2.818472 | 1.500458 |
mul256 | 1129000061 | 20.599212858 | 7.086396 | 1.502394 |
div64 | 1129000061 | 9.658561249 | 5.953036 | 6.957576 |
div128 | 1129000061 | 12.370122234 | 10.857481 | 6.652817 |
div256 | 1129000061 | 36.699296987 | 18.487051 | 7.299538 |
exp | 1281870061 | 130.965289511 | 45.846095 | 8.183632 |
Attempting to correct interpreter overhead proves to be fraught with peril. So below I report the total time to execute a single operation including the interpreter overhead, as reported by the VMs themselves. The nop test uses blocks of jumpdest jumpdest
, and the pop test uses blocks of dup2 pop
. They can be helpful in estimating interpreter overhead, being very little but overhead, but understanding the interpreter code helps.
Also for comparison, mul64c.c does the same calculation with blocks of x *= y
. Unoptimized (gcc -O0) it runs at 0.27 ns/op, and fully optimized (gcc -O3) at 0.0016 ns/OP. That sort of sets a bound on how fast a VM could be.
(ns/OP) | geth | aleth | evmone | C — C opt |
---|---|---|---|---|
nop | 9.33 | 6.57 | 2.31 | — |
pop | 13.76 | 7.14 | 1.95 | — |
add64 | 24.83 | 8.98 | 2.84 | — |
add128 | 26.62 | 11.09 | 3.04 | — |
add256 | 25.62 | 16.42 | 2.87 | — |
sub64 | 24.23 | 9.29 | 3.62 | — |
sub128 | 27.21 | 11.90 | 3.62 | — |
sub256 | 26.52 | 15.86 | 3.62 | — |
mul64 | 25.34 | 8.06 | 4.64 | 0.27 — 0.0016 |
mul128 | 25.88 | 8.70 | 4.63 | — |
mul256 | 63.58 | 21.87 | 4.64 | — |
div64 | 29.81 | 18.37 | 21.47 | — |
div128 | 38.18 | 33.51 | 20.53 | — |
div256 | 113.27 | 57.06 | 22.53 | — |
exp | 13473.80 | 4716.68 | 841.94 | — |
Finally we have the time for each arithmetic operation normalized to nanoseconds per unit of gas. If every opcode were perfectly priced the values for each column of tests for a VM would be the same. This isn't the case, especially for division and exponentiation.
(ns/gas) | gas | geth | aleth | evmone |
---|---|---|---|---|
add64 | 128000000 | 28.02 | 4.68 | 2.26 |
add128 | 128000000 | 32.54 | 10.01 | 2.75 |
add256 | 128000000 | 30.02 | 23.50 | 2.33 |
sub64 | 128000000 | 26.50 | 5.45 | 4.21 |
sub128 | 128000000 | 34.05 | 12.06 | 4.21 |
sub256 | 128000000 | 32.29 | 22.07 | 4.21 |
mul64 | 384000000 | 9.77 | 0.78 | 2.26 |
mul128 | 384000000 | 10.22 | 1.32 | 2.26 |
mul256 | 384000000 | 42.03 | 12.43 | 2.27 |
div64 | 384000000 | 13.54 | 9.48 | 16.47 |
div128 | 384000000 | 20.60 | 22.25 | 15.68 |
div256 | 384000000 | 83.96 | 42.12 | 17.36 |
exp | 536870000 | 235.64 | 81.09 | 14.06 |
I still have several small Solidity programs to test. The programs wouldn't compile with version 5, and I broke some of them getting them to compile.
@chfast @cdetrio @holiman @axic You might find these interesting.
I do notice that the numbers in nanoseconds are unreasonably low, though in relative order. I suspect I am over-correcting for interpreter overhead. Also, the formula I'm using is numerically unstable, and I haven't worked out a better one yet.
I've edited the results to stop trying to correct for overhead, am happy with the formulas, and have tested the C version of mul64.