go-ethereum
go-ethereum copied to clipboard
trie: reduce the memory allocation in trie hashing
This pull request optimizes trie hashing by reducing memory allocation overhead. Specifically:
- define a fullNodeEncoder pool to reuse encoders and avoid memory allocations.
- simplify the encoding logic for shortNode and fullNode by getting rid of the Go interfaces.
Benchmark results
- The memory allocation has been reduced significantly
- The CPU IOWait is constantly higher with unknown reason
- The overall performance is slightly slower
Memory profile
[[ PR ]]
(pprof) top10
Showing nodes accounting for 8512.51GB, 46.05% of 18486.12GB total
Dropped 2316 nodes (cum <= 92.43GB)
Showing top 10 nodes out of 246
flat flat% sum% cum cum%
2131.23GB 11.53% 11.53% 4212.13GB 22.79% github.com/ethereum/go-ethereum/trie.decodeFull
2081.41GB 11.26% 22.79% 2081.43GB 11.26% github.com/ethereum/go-ethereum/trie.decodeRef
1227.24GB 6.64% 29.43% 1227.24GB 6.64% github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
1016.72GB 5.50% 34.93% 1016.72GB 5.50% github.com/ethereum/go-ethereum/trie.(*tracer).onRead (inline)
385GB 2.08% 37.01% 418.67GB 2.26% github.com/ethereum/go-ethereum/core/state.newObject
366.55GB 1.98% 38.99% 676.40GB 3.66% github.com/ethereum/go-ethereum/core/state.(*stateObject).GetCommittedState
342.26GB 1.85% 40.84% 425.03GB 2.30% github.com/ethereum/go-ethereum/core/state.(*stateObject).finalise
340.53GB 1.84% 42.69% 3821.35GB 20.67% github.com/ethereum/go-ethereum/trie.(*Trie).insert
340.36GB 1.84% 44.53% 340.36GB 1.84% github.com/ethereum/go-ethereum/core/vm.codeBitmap
281.20GB 1.52% 46.05% 281.20GB 1.52% github.com/ethereum/go-ethereum/trie.keybytesToHex (inline)
[[ Master ]]
(pprof) alloc_space
(pprof) top10
Showing nodes accounting for 13090.96GB, 54.01% of 24239.01GB total
Dropped 2401 nodes (cum <= 121.20GB)
Showing top 10 nodes out of 221
flat flat% sum% cum cum%
4398.55GB 18.15% 18.15% 5110.55GB 21.08% github.com/ethereum/go-ethereum/trie.(*hasher).hashFullNodeChildren
2248.19GB 9.28% 27.42% 4442.15GB 18.33% github.com/ethereum/go-ethereum/trie.decodeFull
2194.50GB 9.05% 36.48% 2194.51GB 9.05% github.com/ethereum/go-ethereum/trie.decodeRef
1301.95GB 5.37% 41.85% 1301.95GB 5.37% github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
1073.52GB 4.43% 46.28% 1073.52GB 4.43% github.com/ethereum/go-ethereum/trie.(*tracer).onRead (inline)
405.73GB 1.67% 47.95% 441.01GB 1.82% github.com/ethereum/go-ethereum/core/state.newObject
384.59GB 1.59% 49.54% 709.87GB 2.93% github.com/ethereum/go-ethereum/core/state.(*stateObject).GetCommittedState
365.17GB 1.51% 51.04% 365.17GB 1.51% github.com/ethereum/go-ethereum/core/vm.codeBitmap
364.32GB 1.50% 52.55% 452.82GB 1.87% github.com/ethereum/go-ethereum/core/state.(*stateObject).finalise
354.43GB 1.46% 54.01% 4021.69GB 16.59% github.com/ethereum/go-ethereum/trie.(*Trie).insert
(pprof)
After running a bit more, it turns out the IOwait is not relevant with the change.
The PR is slightly slower though.
My 2c is, optimizing to reduce GC churn at the cost of increased runtime is not really an optimization. We just moved the cost of GC churn to somewhere else (the sync pool in this case).
https://grafana.ethquokkaops.io/d/Jpk-Be5Wa/dual-geth-gary?orgId=2&from=now-6h&to=now&timezone=browser&var-exp=bench07&var-master=bench08&var-percentile=50
PR is constantly faster than master.