ethereumjs-monorepo icon indicating copy to clipboard operation
ethereumjs-monorepo copied to clipboard

VM: Breaking Release Performance Analysis

Open holgerd77 opened this issue 3 years ago • 13 comments

Along the breaking releases work we have done substantial structural changes to the libraries and especially the VM/EVM, particularly to mention is the BigInt transition.

It would be nice if we would get a better picture on the performance impacts on this. This might be by some benchmark comparison of branches, but also highly interesting is some real world performance comparison by e.g. running the client on some set of selected blocks of a chain on both the old and the new versions.

It would be great if this would lead to some cool numbers, some stats, eventually also a graph or diagram, which we can use for release promotion on Twitter at some point! 😄

Side note on some performance experiments (which might be done along the work on this):

  1. The switch to native JS hashes by using the Noble hash function might have some performance counter effects. Would be interesting to see if this is neglectable or not. There was also the suggestion at some point to allow at least for the Trie library (as some constructor option) to switch out the hash function. Maybe this is an occasion to test and see what performance impacts are (likely in a first round with the Trie benchmark suite).

  2. It is now possible to switch out the DB engine on the Trie with the new DB abstraction. Would be highly interesting use the DB solution suggested by @faustbrian and see how this goes.

(both 1. and 2. would benefit from some PR references)

holgerd77 avatar Jul 13 '22 15:07 holgerd77

  1. I have a branch somewhere that swapped out the hashing functions to use https://github.com/bcoin-org/bcrypto which was a lot faster than noble because it uses native C functions. Will check next week if I can find it and if not recreate it and post some results here.
  2. Do you mean LMDB? https://github.com/kriszyp/lmdb-js

faustbrian avatar Jul 14 '22 02:07 faustbrian

  1. I have a branch somewhere that swapped out the hashing functions to use https://github.com/bcoin-org/bcrypto which was a lot faster than noble because it uses native C functions. Will check next week if I can find it and if not recreate it and post some results here.

👍

2. Do you mean LMDB? https://github.com/kriszyp/lmdb-js

Yes.

holgerd77 avatar Jul 14 '22 09:07 holgerd77

  1. I have a branch somewhere that swapped out the hashing functions to use https://github.com/bcoin-org/bcrypto which was a lot faster than noble because it uses native C functions. Will check next week if I can find it and if not recreate it and post some results here.

I can vouch for this. I went the other way in a PR I did for discv5 and secp256k1 ops are much slower with noble. This isn't a 1-1 since we are using them differently but here's some rough numbers using benchmark to give you a sense for performance differences on hash and secp256k1 sign/verify ops

https://github.com/ChainSafe/discv5/pull/197#issuecomment-1178154867

acolytec3 avatar Jul 14 '22 09:07 acolytec3

A good benchmark suite would be great indeed. Some additional context:

  • VM got 50% faster after bigint transition the last time i've benchmarked it in april
  • benchmark comparison of noble-hashes vs node.js built-in: hashes README. keccak256 is not present at all in node.js built-in
  • noble's keccak256 can be made 3.5x faster with a simple trick that is not applied currently, due to readability/auditability
  • Many C modules don't work in browsers and are node.js-only solution. I see no point in using those when you can just use built-in node.js stuff instead

paulmillr avatar Jul 15 '22 23:07 paulmillr

Is the VM intended to run in browsers (client-side) or primarily targeting server-side use? Depending on what the requirements or target audiences for the VM are it doesn't really matter if package X doesn't work in the browser and compromising on performance for the sake of that doesn't seem reasonable. The requirements for this would also affect benchmarking since it makes no sense to benchmark X that doesn't support browsers if browser support is an absolute must-have or you would have to polyfill X with Y when bundling for a browser.

Also, there's clearly a point in using something like bcrypto over built-in node.js stuff or it wouldn't exist in the first place because no matter how hard you try, you won't ever get close to the performance of modules that use C/Rust bindings instead of pure JavaScript that gets fired through the V8 JavaScript engine. I guess noble using Rust bindings instead of implementing everything in pure JavaScript isn't an option because of readability/auditability?

faustbrian avatar Jul 17 '22 04:07 faustbrian

@faustbrian

Also, there's clearly a point in using something like bcrypto over built-in node.js stuff

Built-in node.js modules are often implemented in low-level languages -- not in JS. They are already "bindings" to openssl and other C software. Also, the cost of bindings is always huge, since a lot of wrappers are required to make it work in v8.

paulmillr avatar Jul 17 '22 09:07 paulmillr

I'm aware of how node.js and its built-in modules work but for example the built-in crypto module is significantly slower than something like bcrypto so it doesn't make sense to use the built-in crypto implementation if you are working on performance critical applications that actually hit the ceiling of what you could benchmark the built-in crypto module to. The v8 overhead is usually negligible because of how big of a performance difference there is between the official node.js implementations and custom ones like bcrypto or now even ones where the bindings are developed in Rust or leveraging WASM so the overhead is just getting more and more negligible as time goes on.

@holgerd77 is browser support for the VM a requirement or is it just a nice-to-have? Not entirely sure in which browser contexts the VM is used right now. If browser support is an absolute must-have and you want to avoid polyfills then your best bet is probably to stick with noble even if you take a 2-3x performance hit and go down the route of allowing to swap out the hash function.

I've been looking at that for the last week but it's rather tedious to do because of how spread out the use of the hashing functions are so I'm still playing around to see how that could be achieved without introducing breaking changes.

faustbrian avatar Jul 17 '22 10:07 faustbrian

The VM definitely has to work in browser. This is only one notably example but the Remix browser-based IDE uses the VM and we need to be able to support use cases like that going forward.

acolytec3 avatar Jul 17 '22 10:07 acolytec3

One other note is we just merged a PR that allows a substitute hash method to be provided in the constructor for the trie library so I'm assuming this gets us a lot of the way to addressing the performance concerns here since you just switch out the default hash algorithm (noble keccak256) for your preferred native C one if running in a nodejs/deno/whatever context.

acolytec3 avatar Jul 17 '22 10:07 acolytec3

Trie Benchmarks

Ran both suites 3 times.

System

CPU: Apple M1 Max
GPU: Apple M1 Max
Memory: 4283MiB / 65536MiB

bcrypto

1k-3-32-ran x 198,641 ops/sec ±18.56% (68 runs sampled)
1k-5-32-ran x 134,830 ops/sec ±28.29% (68 runs sampled)
1k-9-32-ran x 125,385 ops/sec ±2.20% (78 runs sampled)
1k-1k-32-ran x 112,838 ops/sec ±1.82% (77 runs sampled)
1k-1k-32-mir x 98,924 ops/sec ±51.09% (57 runs sampled)
Checkpointing: 100 iterations x 2,121 ops/sec ±32.31% (37 runs sampled)
Checkpointing: 500 iterations x 522 ops/sec ±29.31% (52 runs sampled)
Checkpointing: 1000 iterations x 273.01 ops/sec ±46.96% (23 runs sampled)
Checkpointing: 5000 iterations x 120.12 ops/sec ±39.15% (28 runs sampled)

noble

1k-3-32-ran x 57,574 ops/sec ±1.94% (96 runs sampled)
1k-5-32-ran x 56,060 ops/sec ±0.77% (96 runs sampled)
1k-9-32-ran x 55,014 ops/sec ±0.83% (95 runs sampled)
1k-1k-32-ran x 48,257 ops/sec ±14.88% (85 runs sampled)
1k-1k-32-mir x 70,403 ops/sec ±0.78% (89 runs sampled)
Checkpointing: 100 iterations x 1,356 ops/sec ±27.86% (69 runs sampled)
Checkpointing: 500 iterations x 404 ops/sec ±32.10% (58 runs sampled)
Checkpointing: 1000 iterations x 186 ops/sec ±39.50% (53 runs sampled)
Checkpointing: 5000 iterations x 42.57 ops/sec ±18.46% (40 runs sampled)

faustbrian avatar Jul 17 '22 11:07 faustbrian

@faustbrian how to run those? What should I do?

paulmillr avatar Jul 17 '22 11:07 paulmillr

Won't have a chance to push anything until tomorrow but you can just use the below code and change the imports of the keccak256 function (in the trie package) to whatever file you put the code into instead of importing from the ethereum crypto package. Then npm run build and npm run benchmarks.

import { Keccak256 } from 'bcrypto';

function keccak256(data) {
    return Keccak256.digest(Array.isArray(data) ? Buffer.concat(data) : data);
}

faustbrian avatar Jul 17 '22 11:07 faustbrian

M1 / 16GB RAM / node 18.6.

noble-hashes

1k-3-32-ran x 64,354 ops/sec ±1.01% (87 runs sampled)
1k-5-32-ran x 59,463 ops/sec ±1.86% (91 runs sampled)
1k-9-32-ran x 59,174 ops/sec ±0.68% (92 runs sampled)
1k-1k-32-ran x 55,642 ops/sec ±0.77% (86 runs sampled)
1k-1k-32-mir x 53,126 ops/sec ±1.20% (87 runs sampled)
Checkpointing: 100 iterations x 3,498 ops/sec ±1.82% (81 runs sampled)
Checkpointing: 500 iterations x 753 ops/sec ±1.74% (79 runs sampled)
Checkpointing: 1000 iterations x 382 ops/sec ±1.76% (76 runs sampled)
Checkpointing: 5000 iterations x 76.19 ops/sec ±0.21% (80 runs sampled)

noble-hashes optimized

1k-3-32-ran x 81,752 ops/sec ±2.47% (81 runs sampled)
1k-5-32-ran x 75,447 ops/sec ±0.73% (94 runs sampled)
1k-9-32-ran x 72,909 ops/sec ±0.45% (96 runs sampled)
1k-1k-32-ran x 67,257 ops/sec ±1.43% (83 runs sampled)
1k-1k-32-mir x 58,602 ops/sec ±4.65% (82 runs sampled)
Checkpointing: 100 iterations x 3,427 ops/sec ±1.93% (80 runs sampled)
Checkpointing: 500 iterations x 730 ops/sec ±1.81% (80 runs sampled)
Checkpointing: 1000 iterations x 363 ops/sec ±2.05% (74 runs sampled)
Checkpointing: 5000 iterations x 72.78 ops/sec ±0.73% (76 runs sampled)

bcrypto

The setup was:

import { Keccak256 } from 'bcrypto';
function keccak256(data: any) {
  return Keccak256.digest(Array.isArray(data) ? Buffer.concat(data) : data);
}
const trie = new Trie({ db, hash: keccak256 })
1k-3-32-ran x 89,425 ops/sec ±2.86% (76 runs sampled)
1k-5-32-ran x 82,695 ops/sec ±0.65% (92 runs sampled)
1k-9-32-ran x 79,322 ops/sec ±0.86% (89 runs sampled)
1k-1k-32-ran x 74,775 ops/sec ±1.09% (87 runs sampled)
1k-1k-32-mir x 63,954 ops/sec ±3.70% (77 runs sampled)
Checkpointing: 100 iterations x 3,367 ops/sec ±1.84% (78 runs sampled)
Checkpointing: 500 iterations x 717 ops/sec ±2.03% (77 runs sampled)
Checkpointing: 1000 iterations x 357 ops/sec ±2.21% (73 runs sampled)
Checkpointing: 5000 iterations x 76.52 ops/sec ±0.13% (64 runs sampled)

paulmillr avatar Jul 17 '22 13:07 paulmillr

Outdated, will close.

holgerd77 avatar Feb 20 '24 11:02 holgerd77