go-ethereum trie: reduce the memory allocation in trie hashing

This pull request optimizes trie hashing by reducing memory allocation overhead. Specifically:

define a fullNodeEncoder pool to reuse encoders and avoid memory allocations.
simplify the encoding logic for shortNode and fullNode by getting rid of the Go interfaces.

Benchmark results

The memory allocation has been reduced significantly
The CPU IOWait is constantly higher with unknown reason
The overall performance is slightly slower

Memory profile

[[ PR ]]
(pprof) top10
Showing nodes accounting for 8512.51GB, 46.05% of 18486.12GB total
Dropped 2316 nodes (cum <= 92.43GB)
Showing top 10 nodes out of 246
      flat  flat%   sum%        cum   cum%
 2131.23GB 11.53% 11.53%  4212.13GB 22.79%  github.com/ethereum/go-ethereum/trie.decodeFull
 2081.41GB 11.26% 22.79%  2081.43GB 11.26%  github.com/ethereum/go-ethereum/trie.decodeRef
 1227.24GB  6.64% 29.43%  1227.24GB  6.64%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
 1016.72GB  5.50% 34.93%  1016.72GB  5.50%  github.com/ethereum/go-ethereum/trie.(*tracer).onRead (inline)
     385GB  2.08% 37.01%   418.67GB  2.26%  github.com/ethereum/go-ethereum/core/state.newObject
  366.55GB  1.98% 38.99%   676.40GB  3.66%  github.com/ethereum/go-ethereum/core/state.(*stateObject).GetCommittedState
  342.26GB  1.85% 40.84%   425.03GB  2.30%  github.com/ethereum/go-ethereum/core/state.(*stateObject).finalise
  340.53GB  1.84% 42.69%  3821.35GB 20.67%  github.com/ethereum/go-ethereum/trie.(*Trie).insert
  340.36GB  1.84% 44.53%   340.36GB  1.84%  github.com/ethereum/go-ethereum/core/vm.codeBitmap
  281.20GB  1.52% 46.05%   281.20GB  1.52%  github.com/ethereum/go-ethereum/trie.keybytesToHex (inline)

[[ Master ]]

(pprof) alloc_space
(pprof) top10
Showing nodes accounting for 13090.96GB, 54.01% of 24239.01GB total
Dropped 2401 nodes (cum <= 121.20GB)
Showing top 10 nodes out of 221
      flat  flat%   sum%        cum   cum%
 4398.55GB 18.15% 18.15%  5110.55GB 21.08%  github.com/ethereum/go-ethereum/trie.(*hasher).hashFullNodeChildren
 2248.19GB  9.28% 27.42%  4442.15GB 18.33%  github.com/ethereum/go-ethereum/trie.decodeFull
 2194.50GB  9.05% 36.48%  2194.51GB  9.05%  github.com/ethereum/go-ethereum/trie.decodeRef
 1301.95GB  5.37% 41.85%  1301.95GB  5.37%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
 1073.52GB  4.43% 46.28%  1073.52GB  4.43%  github.com/ethereum/go-ethereum/trie.(*tracer).onRead (inline)
  405.73GB  1.67% 47.95%   441.01GB  1.82%  github.com/ethereum/go-ethereum/core/state.newObject
  384.59GB  1.59% 49.54%   709.87GB  2.93%  github.com/ethereum/go-ethereum/core/state.(*stateObject).GetCommittedState
  365.17GB  1.51% 51.04%   365.17GB  1.51%  github.com/ethereum/go-ethereum/core/vm.codeBitmap
  364.32GB  1.50% 52.55%   452.82GB  1.87%  github.com/ethereum/go-ethereum/core/state.(*stateObject).finalise
  354.43GB  1.46% 54.01%  4021.69GB 16.59%  github.com/ethereum/go-ethereum/trie.(*Trie).insert
(pprof)

May 26 '25 01:05 rjl493456442

After running a bit more, it turns out the IOwait is not relevant with the change.

The PR is slightly slower though.

May 27 '25 02:05 rjl493456442

My 2c is, optimizing to reduce GC churn at the cost of increased runtime is not really an optimization. We just moved the cost of GC churn to somewhere else (the sync pool in this case).

Jul 22 '25 12:07 omerfirmak

https://grafana.ethquokkaops.io/d/Jpk-Be5Wa/dual-geth-gary?orgId=2&from=now-6h&to=now&timezone=browser&var-exp=bench07&var-master=bench08&var-percentile=50

PR is constantly faster than master.

Aug 01 '25 02:08 rjl493456442

截屏2025-08-01 10 13 13 截屏2025-08-01 10 13 26 截屏2025-08-01 10 13 46

Aug 01 '25 02:08 rjl493456442

go-ethereum go-ethereum copied to clipboard

trie: reduce the memory allocation in trie hashing

go-ethereum
go-ethereum copied to clipboard