erigon
erigon copied to clipboard
OOM while gathering headers (Polygon Mainnet / BorHeimdall)
System information
Erigon Version: 2.56.1-9e63c927
OS & Version: Ubuntu 22.04.3 LTS
Erigon Command (with flags/config):
GOMEMLIMIT=16GiB GOGC=50 ./build/bin/erigon \
--chain bor-mainnet \
--datadir /chaindata/polygon/ \
--bor.heimdall https://heimdall-api.polygon.technology \
--port 30303 \
--http \
--http.addr "0.0.0.0" \
--http.port 8545 \
--http.api eth,debug,net,trace,web3,erigon \
--http.vhosts "*" \
--http.corsdomain "*" \
--ws \
--torrent.port 42069 \
--txpool.pricelimit 30000000000 \
--bootnodes='enode://b8f1cc9c5d4403703fbf377116469667d2b1823c0daf16b7250aa576bacf399e42c3930ccfcb02c5df6879565a2b8931335565f0e8d3f8e72385ecf4a4bf160a@3.36.224.80:30303,enode://8729e0c825f3d9cad382555f3e46d>
--torrent.upload.rate="1024mb" \
--torrent.download.rate="1024mb" \
--torrent.conns.perfile=4 \
--batchSize "1GB" \
--etl.bufferSize "1GB" \
--bodies.cache 21474836480 \
--db.size.limit="12TB" \
--db.pagesize="16KB" \
--db.read.concurrency=1000 \
--rpc.batch.concurrency=1000 \
--downloader.verify \
--pprof
Chain/Network:
Polygon Mainnet
Expected behaviour
To not break with an OOM error / kill. To respect limits / configs indicated by GOMEMLIMIT, GOMAXPROCS, GOGC.
Actual behaviour
Running this on a 32GB ram server, in a few mins the process reserves 30,2g of ram, soon after if get kill and exits with an 137 exit code
Feb 23 10:45:59 polygon-1 systemd[1]: erigon-polygon.service: Main process exited, code=exited, status=137/n/a
Feb 23 10:45:59 polygon-1 systemd[1]: erigon-polygon.service: Failed with result 'exit-code'.
Feb 23 10:45:59 polygon-1 systemd[1]: erigon-polygon.service: Consumed 1h 33min 50.627s CPU time.
Feb 23 11:10:39 polygon-1 systemd[1]: erigon-polygon.service: A process of this unit has been killed by the OOM killer.
Feb 23 11:10:41 polygon-1 systemd[1]: erigon-polygon.service: Failed with result 'oom-kill'.
Feb 23 11:10:41 polygon-1 systemd[1]: erigon-polygon.service: Consumed 1h 15min 8.853s CPU time.
Steps to reproduce the behaviour
- Start the process
- Wait for 15 - 30 mins
- Check the systemd logs for a oom caused restart
Backtrace / Logs
Feb 23 11:05:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:05:18.861] [p2p] GoodPeers eth68=1
Feb 23 11:05:20 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:05:20.070] [txpool] stat pending=12 baseFee=0 queued=1938 alloc=22.6GB sys=24.6GB
Feb 23 11:05:48 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:05:48.225] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=29089987
Feb 23 11:06:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:06:18.226] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=27886352
Feb 23 11:06:48 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:06:48.226] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=26724741
Feb 23 11:07:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:07:18.238] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=25548755
Feb 23 11:07:48 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:07:48.477] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=24420396
Feb 23 11:08:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:08:18.622] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=23229773
Feb 23 11:08:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:08:18.860] [p2p] GoodPeers eth68=1
Feb 23 11:08:20 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:08:20.571] [txpool] stat pending=18 baseFee=0 queued=2826 alloc=28.9GB sys=31.6GB
Feb 23 11:08:48 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:08:48.771] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=22249133
Feb 23 11:09:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:09:18.240] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=22176295
Feb 23 11:09:48 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:09:48.515] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=22151759
Feb 23 11:10:18 polygon-1 erigon-polygon[223650]: [INFO] [02-23|11:10:18.352] [3/15 BorHeimdall] Gathering headers for validator proposer prorities (backwards) blockNum=22150299
Feb 23 11:10:37 polygon-1 erigon-polygon[223650]: [WARN] [02-23|11:10:37.819] [bor.heimdall] an error while fetching path=/milestone/lastNoAck queryParams= attempt=1 err="Get \"https://heimdall-api.polygon.technology/milestone/lastNoAck\": context deadline exceeded"
Feb 23 11:10:39 polygon-1 systemd[1]: erigon-polygon.service: A process of this unit has been killed by the OOM killer.
Feb 23 11:10:41 polygon-1 systemd[1]: erigon-polygon.service: Failed with result 'oom-kill'.
Feb 23 11:10:41 polygon-1 systemd[1]: erigon-polygon.service: Consumed 1h 15min 8.853s CPU time.
Feb 23 11:10:41 polygon-1 systemd[1]: erigon-polygon.service: Scheduled restart job, restart counter is at 104.
go tool pprof -inuse_space -png http://127.0.0.1:6060/debug/pprof/heap > mem5.png
Gj
Fixed by: https://github.com/ledgerwatch/erigon/pull/10027