go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

core/vm: add micro-optimization

Open MariusVanDerWijden opened this issue 1 year ago • 11 comments

This PR adds Uint256() to the Address type, to create a uint256 from an address. Since we know that the address type is 20 bytes, we can use setBytes20 instead of setBytes which saves us one switch.

Benchmarks:

Master:
//BenchmarkOpAddress-24        31233150            38.22 ns/op         183 B/op           0 allocs/op
Pr:
//BenchmarkOpAddress-24        33228558            35.39 ns/op         172 B/op           0 allocs/op

MariusVanDerWijden avatar Aug 14 '24 11:08 MariusVanDerWijden

I think it's not worth to add like this. There might be a way to avoid the allocation though: since the stack is just []uint256.Int, and may have capacity left over from earlier extensions, we could perform a check like this:

if cap(stack) > len(stack) {
    stack = stack[:len(stack)+1]
} else {
    stack = append(stack, uint256.Int{})
}
last = stack[:len(stack)]
last.SetBytes20(address[:])

Maybe this optimization also generalizes to other stack push operations.

fjl avatar Aug 14 '24 17:08 fjl

Code above could probably be simplified to just use append instead of checking for available cap explicitly. However, using append will zero the Int in the slice, which we don't need because we clear it ourself anyway.

fjl avatar Aug 14 '24 21:08 fjl

I think it's a pretty solid idea. We always initialize the stack with a pre-allocated uint256 slices with cap 16. As EVM has the restriction of stack depth (to 1024, probably we should pre-allocate a larger uint256 slice for stack?), I think this optimization could be applied to most of the push operations.

	New: func() interface{} {
		return &Stack{data: make([]uint256.Int, 0, 16)}
	},

Btw, the slice expansion might be a performance bottleneck to some extent. The maximum size of stack could be 1024 theoretically, which will involves 7 slice expansion in total.

We could increase the pre-allocated size to 256 maybe.

rjl493456442 avatar Aug 15 '24 02:08 rjl493456442

Pre-allocated stack size was previously reduced from a default of 1024 to current of 16 in https://github.com/ethereum/go-ethereum/pull/21222 . Maybe a pre-alloced size of 256 is a reasonable compromise between memory usage and performance.

jwasinger avatar Aug 15 '24 03:08 jwasinger

The 1024 was for RStack which has been removed (although i have no idea what RStack is). The default size for stack is always 16.

rjl493456442 avatar Aug 15 '24 03:08 rjl493456442

Pre-allocated stack size was previously reduced from a default of 1024 to current of 16

I think not. Using a pre-allocated 1024 in a somewhat deep callchain which only ever uses a few element is waste of time. There's another thing we can do, which I have higher hopes for, theoretically.

Use one contiguous stack. Then, at call-sites, keep a reference (this scope "stack zero" is at index 28). So all scopes use the same stack, but they never pop past their "stack zero", and they are not allowed to push more than 1024. This scheme makes it so that all execution can operate on pre-allocated elements, but if a call scope doesn't use any elements, it does not "hog" pre-allocated items.

I once implemented this, but for some reason it didn't seem to pay off enough to motivate it's merging. The branch is here: https://github.com/ethereum/go-ethereum/compare/master...holiman:go-ethereum:contiguous_stack (it also contains some other stack-stuff, and is very old, 2017, still based on uint256)

holiman avatar Aug 15 '24 06:08 holiman

although i have no idea what RStack is

It was the return-stack, which we added for the old EIP 2315

holiman avatar Aug 15 '24 06:08 holiman

I would say let's try to get @holiman's contiguous_stack branch up to date. It's a very promising approach.

fjl avatar Aug 15 '24 16:08 fjl

Probably better to pick it up from scratch. Back then, I had to add typed accessor methods like popAddress, since the stack was pointers, and I couldn't let those escape.

It should be simpler now. I'll give it a spin tomorrow

namiloh avatar Aug 15 '24 18:08 namiloh

Hmm, I always thought we pre-allocate a 1024 stack and use it contiguously across all subcalls. That seems like a nobrainer thing we should do. Then we can make all the opcodes 0 alloc afterwards.

karalabe avatar Aug 28 '24 03:08 karalabe

That seems like a nobrainer thing we should do. Then we can make all the opcodes 0 alloc afterwards.

Yeah, well, we don't :shrug: . Here's a sketch of how we might go about it though: https://github.com/ethereum/go-ethereum/pull/30362

holiman avatar Aug 28 '24 06:08 holiman

Yeah I think we should move forward with Martins idea of never de-allocing the items, closing this for now

MariusVanDerWijden avatar Sep 04 '24 12:09 MariusVanDerWijden