go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

trie: reduce the memory allocation in trie hashing

Open rjl493456442 opened this issue 6 months ago • 1 comments

This pull request optimizes trie hashing by reducing memory allocation overhead. Specifically:

  • define a fullNodeEncoder pool to reuse encoders and avoid memory allocations.
  • simplify the encoding logic for shortNode and fullNode by getting rid of the Go interfaces.

Benchmark results

  • The memory allocation has been reduced significantly
  • The CPU IOWait is constantly higher with unknown reason
  • The overall performance is slightly slower
截屏2025-05-26 09 38 14 截屏2025-05-26 09 39 13

Memory profile

[[ PR ]]
(pprof) top10
Showing nodes accounting for 8512.51GB, 46.05% of 18486.12GB total
Dropped 2316 nodes (cum <= 92.43GB)
Showing top 10 nodes out of 246
      flat  flat%   sum%        cum   cum%
 2131.23GB 11.53% 11.53%  4212.13GB 22.79%  github.com/ethereum/go-ethereum/trie.decodeFull
 2081.41GB 11.26% 22.79%  2081.43GB 11.26%  github.com/ethereum/go-ethereum/trie.decodeRef
 1227.24GB  6.64% 29.43%  1227.24GB  6.64%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
 1016.72GB  5.50% 34.93%  1016.72GB  5.50%  github.com/ethereum/go-ethereum/trie.(*tracer).onRead (inline)
     385GB  2.08% 37.01%   418.67GB  2.26%  github.com/ethereum/go-ethereum/core/state.newObject
  366.55GB  1.98% 38.99%   676.40GB  3.66%  github.com/ethereum/go-ethereum/core/state.(*stateObject).GetCommittedState
  342.26GB  1.85% 40.84%   425.03GB  2.30%  github.com/ethereum/go-ethereum/core/state.(*stateObject).finalise
  340.53GB  1.84% 42.69%  3821.35GB 20.67%  github.com/ethereum/go-ethereum/trie.(*Trie).insert
  340.36GB  1.84% 44.53%   340.36GB  1.84%  github.com/ethereum/go-ethereum/core/vm.codeBitmap
  281.20GB  1.52% 46.05%   281.20GB  1.52%  github.com/ethereum/go-ethereum/trie.keybytesToHex (inline)
[[ Master ]]

(pprof) alloc_space
(pprof) top10
Showing nodes accounting for 13090.96GB, 54.01% of 24239.01GB total
Dropped 2401 nodes (cum <= 121.20GB)
Showing top 10 nodes out of 221
      flat  flat%   sum%        cum   cum%
 4398.55GB 18.15% 18.15%  5110.55GB 21.08%  github.com/ethereum/go-ethereum/trie.(*hasher).hashFullNodeChildren
 2248.19GB  9.28% 27.42%  4442.15GB 18.33%  github.com/ethereum/go-ethereum/trie.decodeFull
 2194.50GB  9.05% 36.48%  2194.51GB  9.05%  github.com/ethereum/go-ethereum/trie.decodeRef
 1301.95GB  5.37% 41.85%  1301.95GB  5.37%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).makeBytes
 1073.52GB  4.43% 46.28%  1073.52GB  4.43%  github.com/ethereum/go-ethereum/trie.(*tracer).onRead (inline)
  405.73GB  1.67% 47.95%   441.01GB  1.82%  github.com/ethereum/go-ethereum/core/state.newObject
  384.59GB  1.59% 49.54%   709.87GB  2.93%  github.com/ethereum/go-ethereum/core/state.(*stateObject).GetCommittedState
  365.17GB  1.51% 51.04%   365.17GB  1.51%  github.com/ethereum/go-ethereum/core/vm.codeBitmap
  364.32GB  1.50% 52.55%   452.82GB  1.87%  github.com/ethereum/go-ethereum/core/state.(*stateObject).finalise
  354.43GB  1.46% 54.01%  4021.69GB 16.59%  github.com/ethereum/go-ethereum/trie.(*Trie).insert
(pprof)

rjl493456442 avatar May 26 '25 01:05 rjl493456442

After running a bit more, it turns out the IOwait is not relevant with the change.

The PR is slightly slower though.

截屏2025-05-27 10 21 29 截屏2025-05-27 10 21 46

rjl493456442 avatar May 27 '25 02:05 rjl493456442

My 2c is, optimizing to reduce GC churn at the cost of increased runtime is not really an optimization. We just moved the cost of GC churn to somewhere else (the sync pool in this case).

omerfirmak avatar Jul 22 '25 12:07 omerfirmak

https://grafana.ethquokkaops.io/d/Jpk-Be5Wa/dual-geth-gary?orgId=2&from=now-6h&to=now&timezone=browser&var-exp=bench07&var-master=bench08&var-percentile=50

PR is constantly faster than master.

rjl493456442 avatar Aug 01 '25 02:08 rjl493456442

截屏2025-08-01 10 13 13 截屏2025-08-01 10 13 26 截屏2025-08-01 10 13 46

rjl493456442 avatar Aug 01 '25 02:08 rjl493456442