ethereumjs-monorepo
ethereumjs-monorepo copied to clipboard
Performance optimization low-hanging fruit
I'm creating this issue to list multiple easy refactors that could be implemented to improve the performance of the VM. Some of them come from EthWork's involvement. Others are mine.
-
[ ]
Common#paramshould be an O(1) operation. It's called an incredible amount of times.- Analysis in https://github.com/ethereumjs/ethereumjs-monorepo/issues/775#issuecomment-860545856
-
[x] The VM's
Memoryclass should be implemented with a Buffer -
[x]
jumpIsValidis a linear operation. It can be constant. Update: After talking with Jochem and reading evmone's code, I'm not sure if this is possible nor beneficial. -
[x] There's a huge number of unnecessary calls to
toString(). Sometimes it is used instead ofBuffer#equals. Sometimes it's called to initialize aBNfrom aBuffer, instead of just passing theBuffer. There obvious instances of this in these functions:Block#validateTransactionsTrieBlock#validateUnclesHashBlock#validateUnclesAccount#isContractAccount#setCodeAccount#isEmptyFakeTransaction#hashTransaction#toCreationAddressEEI#getExternalBalancePUSH's handler: creats a string to create a BN instead of just creating the BNVM#runBlock: multiple comparisons of Buffers by stringVM#runCallVM#runTx: calls toString on a Buffer instead of just checking its sizeDefaultStateManager#accountIsEmpty
-
[x] The opcodes
handlersmap is indexed by string, it should be done by number. -
[x] The opcodes
handlersmap and theopcodesOpcodeList's one, should probably beMaps. The semantic ofMapis way simpler than an accessing an object withobject[key]. Done in #852 -
[x]
Transaction#getDataFeeis called very frequently and super slow:-
[x] It should be cached
-
[x] Calls
Common#param(which is really slow) within a loop, could be done outside of it. -
[x] Uses
BNs in a loop while it could use normal numbers and then transform it toBN. --BNoperations allow new objects
-
-
[x]
Transaction#hashshould be cached -
[x]
BlockHeader#hashshould be cached -
[ ] The VM's
getOpcodesForHFcan be cached. It's called a lot and iterates a 256 elements map. -
[ ] The VM
ERRORenum should not have string values, as those are compared very frequently.
Common#param should be an O(1) operation. It's called an incredible amount of times.
My intuition is Common configs should be written in JS datastructures not JSON. You can then more easily arrange them to allow easy access. Generally I think Common is a bit over-abstracted and it'd be better to break the configs related to each package and move it to the package directly. Whatever config has to be used by multiple packages (if any) can remain in Common.
There's a huge number of unnecessary calls to toString()
This is very prevalent as you mention. In the VM one valid use-case is using them as keys for Maps and Sets, etc. Do you think we should adopt alternatives that support Buffers directly?
A major performance drain when running mainnet blocks seems to be emitting the step event. I've even seen the time for running a block get reduced by 50% after disabling this event in the VM.
As a temporary solution we can add an option to the VM to disable emitting events for users who don't need it. We should also investigate the async-eventemitter library to see if there are rooms for optimization there. I'm not aware what the main reason behind using this library is vs the stock node EventEmitter.
Reminder for myself from @alcuadrado's comment here:
I think we can also take advantage of the immutable interface to more aggressively cache some common operations. When profiling the VM I've seen thinks like hashing, and validating txs taking a considerable part of the execution time. We can have a constant public interface, but keep some mutable/lazily-initialized private properties to cache those.
Update: someone at some point should have another look at the initial list here and maybe (directly in the comment) transform to a check list (and check the things already done), otherwise we won't get hold of this.
Some of these have been resolved thanks to this issue so I will convert to a checklist so we can have a clear picture of remaining items. I can also tag on a per-item basis if it would be a bug fix or breaking change so we can prioritize what can be done now versus later.
I think Common#param is an important one that is called a lot. @holgerd77 do you have an intuition if we would be able to make it a O(1) operation? If it needs code restructuring perhaps we can implement it as a new method and mark the old one as deprecated.
One idea would be to add some simple dict (or similar) cache and add values on first usage and erase on (generally seldomly occurring I would very much assume) HF or chain switches along Common usage.
Did some simple tests on this, might be worth it, script I used:
import Common from './src'
let c = new Common({ chain: 'mainnet' })
let dummyCache = {
'value1': 15,
'value2': 15,
'value3': 15,
'value4': 15,
'value5': 15,
'value6': 15,
'value7': 15,
'value8': 15,
'value9': 15,
'value10': 15,
'value11': 15,
'value12': 15,
'minGasLimit': 500,
}
for (const iterations of [1000, 10000, 100000, 1000000, 10000000]) {
let newA, newB
console.log(`Testing with ${iterations} iterations.`)
console.time('T')
for (let i=0; i<iterations; i++) { newA = c.param('gasConfig', 'minGasLimit') }
console.timeEnd('T')
console.time('T')
for (let i=0; i<iterations; i++) { newB = dummyCache['minGasLimit'] }
console.timeEnd('T')
}
Results:
Testing with 1000 iterations.
T: 5.489ms
T: 0.053ms
Testing with 10000 iterations.
T: 8.378ms
T: 0.421ms
Testing with 100000 iterations.
T: 52.756ms
T: 0.116ms
Testing with 1000000 iterations.
T: 285.341ms
T: 0.824ms
Testing with 10000000 iterations.
T: 2.788s
T: 7.871ms
Did another small test to put things in perspective and added a global param counter to Common which does
paramCounter += 1
if (paramCounter % 100 == 0) {
console.log(paramCounter)
}
For most single state tests this is going into the hundreds, for some in the thousands and for selected in the tens of thousands. The whole state test suite comes to ~8.579.700. So this would be some 2s win for the tests (under the assumption that the cache works well respectively most of the reads would be done from a cache if available).
On the benchmark run (so for this 12 or so blocks together with a single iteration) this gets to ~1.130.300 calls, so roughly 200-300ms.
Hmm, not clear indicators for me on first sight that this is worth the effort to introduce some cache or something. But also not completely out of question I guess? 🤔
Update: someone at some point should have another look at the initial list here and maybe (directly in the comment) transform to a check list (and check the things already done), otherwise we won't get hold of this.
I have converted the opening post to a check list.
Nice! 😄
Linking #865