erigon icon indicating copy to clipboard operation
erigon copied to clipboard

"exited with code 137" at bodies stage

Open wmitsuda opened this issue 2 years ago • 8 comments

v2022.07.02, I'm trying to sync goerli, it keeps being killed at ~same stage/block, it seems because out of memory, but I gave 16GB for the entire docker process.

I'll try to increase memory, but 16GB doesn't seem to be too much for bodies stage/testnet? maybe a leak?

one-liner-archive-erigon-1     | [INFO] [07-10|12:56:04.569] [4/16 Bodies] Wrote block bodies         block_num=7023561 delivery/sec=14.4MB wasted/sec=495.4KB alloc=10.9GB sys=11.4GB
one-liner-archive-erigon-1     | [INFO] [07-10|12:56:24.558] [4/16 Bodies] Wrote block bodies         block_num=7038411 delivery/sec=10.2MB wasted/sec=545.4KB alloc=12.0GB sys=12.5GB
one-liner-archive-erigon-1 exited with code 137
one-liner-archive-erigon-1     | [INFO] [07-10|13:16:54.738] [txpool] stat                            block=7202381 pending=0 baseFee=0 queued=0 alloc=10.8GB sys=11.2GB
one-liner-archive-erigon-1     | [INFO] [07-10|13:16:57.361] [4/16 Bodies] Wrote block bodies         block_num=7039027 delivery/sec=11.7MB wasted/sec=2.0MB alloc=10.9GB sys=11.4GB
one-liner-archive-erigon-1 exited with code 137
one-liner-archive-erigon-1     | [INFO] [07-10|13:17:16.683] Build info                               git_branch=HEAD git_tag=v2022.07.02-otterscan-dirty git_commit=c10677e22a98f587220e42f5e3d0724dae02b0ec

wmitsuda avatar Jul 10 '22 13:07 wmitsuda

Yep, likely leak. Or bug in logic.

AskAlexSharov avatar Jul 10 '22 14:07 AskAlexSharov

I searched a little more in the logs, found another crash occurrence during execution stage:

one-liner-archive-erigon-1     | [INFO] [07-10|10:19:16.670] [6/16 Execution] Executed blocks         number=2594591 blk/s=603.4 tx/s=513.4 Mgas/s=141.3 gasState=0.93 batch=796.8MB alloc=13.8GB sys=15.0GB
one-liner-archive-erigon-1 exited with code 137
o

wmitsuda avatar Jul 10 '22 16:07 wmitsuda

Yeah, latest versions seems to have a leak at initial sync, we had these as well. @revittm can you try or ask someone to take a look?

mandrigin avatar Jul 14 '22 16:07 mandrigin

Sure, will have a look tomorrow!

On Thu, 14 Jul 2022, 17:08 Igor Mandrigin, @.***> wrote:

Yeah, latest versions seems to have a leak at initial sync, we had these as well. @revittm https://github.com/revittm can you try or ask someone to take a look?

— Reply to this email directly, view it on GitHub https://github.com/ledgerwatch/erigon/issues/4689#issuecomment-1184627915, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFJN24UAHJOPXT7OYNCMZ3VUA3PLANCNFSM53FAVXPQ . You are receiving this because you were mentioned.Message ID: @.***>

revitteth avatar Jul 14 '22 16:07 revitteth

Same issue here with Goerli on devel as well as v2022.07.04

This happened when I added --prune htc flags. It fully synced when it didn't have the prune flags.

suspended avatar Aug 05 '22 04:08 suspended

@suspended how much ram you have? It’s oom killer. Can you try update erigon?

AskAlexSharov avatar Aug 05 '22 04:08 AskAlexSharov

@suspended how much ram you have? It’s oom killer. Can you try update erigon?

I had 16GB RAM with no swap. Adding 8GB of swap helped and I was able to sync successfully.

suspended avatar Aug 05 '22 17:08 suspended

Hopefully fixed by the latest OOM fix for stage bodies, can confirm with a devel build or waiting until this is in an alpha release. https://github.com/ledgerwatch/erigon/pull/5604

hexoscott avatar Oct 04 '22 10:10 hexoscott

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] avatar Jan 04 '23 02:01 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Jan 11 '23 02:01 github-actions[bot]