erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Erigon OOM Killed - (currently trying 2.53.4)

Open mdominoni opened this issue 1 year ago • 7 comments

System information erigon version 2.53.4

OS & Version: Linux / Ubuntu on AWS with 64 GB RAM

Commit hash: tag - v2.53.4

Erigon Service:

[Unit] Description=Erigon Execution Layer Client service (Mainet) Wants=network-online.target After=network-online.target

[Service] Environment="GOGC=50 GOMEMLIMIT=24GiB GOMAXPROCS=2" MemoryLimit=24G OOMScoreAdjust=-100 Type=simple User=root Restart=allways RestartSec=5 KillSignal=SIGINT TimeoutStopSec=300 ExecStart=/opt/erigon/build/bin/erigon
--datadir /opt/data/erigon
--chain mainnet
--port "30303"
--metrics
--pprof
--authrpc.jwtsecret "/opt/secrets/jwt.hex"
--http
--ws
--http.vhosts=""
--http.corsdomain="
"
--http.addr="0.0.0.0"
--http.port "8545"
--http.api "eth,erigon,personal,db,admin,web3,net,trace,rpc,debug,txpool"
--txpool.api.addr "0.0.0.0:9094"
--private.api.addr "0.0.0.0:9090"
--batchSize=1G [Install] WantedBy=multi-user.target

Consensus Layer: lighthouse Lighthouse v4.5.0-441fc16

Consensus Service:

[Unit] Description=Lighthouse Consensus Layer Client BN (Mainet) Wants=network-online.target After=network-online.target

[Service] Type=simple User=root Restart=allways RestartSec=5 KillSignal=SIGINT TimeoutStopSec=300 ExecStart=/usr/local/bin/lighthouse bn
--network mainnet
--datadir "/opt/data/lighthouse"
--execution-endpoint http://localhost:8551
--execution-jwt "/opt/secrets/jwt.hex"
--checkpoint-sync-url https://mainnet.checkpoint.sigp.io
--disable-deposit-contract-sync
--reconstruct-historic-states
--metrics

[Install] WantedBy=multi-user.target

Chain/Network: mainnet

Expected behaviour Node properly syncs after version upgrarde

Actual behaviour After a couple of hours synchronized, erigon get's killed by OOM

Steps to reproduce the behaviour Full sync on v2.51.0, then upgrade to v2.53.4

Backtrace N/A

Executed go tool pprof -inuse_space -png http://127.0.0.1:6060/debug/pprof/heap > mem.png mem

mdominoni avatar Nov 27 '23 12:11 mdominoni

This mem.png shows - everything is good: using expected 3gb

AskAlexSharov avatar Nov 27 '23 13:11 AskAlexSharov

Ok, but OOM is still happening, is there anything else I can do to prevent this happening all the time? Screenshot from 2023-11-27 14-30-14

dmesg shows:

[210146.815414] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=eth1.service,mems_allowed=0,oom_memcg=/system.slice/eth1.service,task_memcg=/system.slice/eth1.service,task=erigon,pid=7926,uid=0 [210146.815570] Memory cgroup out of memory: Killed process 7926 (erigon) total-vm:5312414528kB, anon-rss:20685544kB, file-rss:2650224kB, shmem-rss:0kB, UID:0 pgtables:4081092kB oom_score_adj:-100 [210148.956419] oom_reaper: reaped process 7926 (erigon), now anon-rss:0kB, file-rss:1958520kB, shmem-rss:0kB

mdominoni avatar Nov 27 '23 17:11 mdominoni

and what shows alloc in logs before kill?

AskAlexSharov avatar Nov 28 '23 05:11 AskAlexSharov

try get profiling when alloc > 5g

AskAlexSharov avatar Nov 28 '23 05:11 AskAlexSharov

[txpool] stat pending=9964 baseFee=0 queued=5125 alloc=3.1GB sys=7.5GB

mem

mdominoni avatar Nov 28 '23 13:11 mdominoni

Unfortunately this pic is healthy

AskAlexSharov avatar Nov 28 '23 16:11 AskAlexSharov

Just to clarify, is it normal that 64 GB are not enought to run Erigon?

luarx avatar Mar 07 '24 14:03 luarx