lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

the lighthouse process is killed

Open hwhe opened this issue 10 months ago • 6 comments

Description

I have a 10 vm's all running lighthouse processes on them. At some point, these process are all killed. they have same logs Apr 13 09:45:05.000 INFO Synced slot: 8848123, block: 0xdb27…2a07, epoch: 276503, finalized_epoch: 276501, finalized_root: 0x16a0…2573, exec_hash: 0x89b0…6fc8 (verified), peers: 109, service: slot_notifier -25 Apr 13 09:45:14.109 WARN Snapshot cache miss for blob verification, index: 0, block_root: 0x4e25…b1a2, service: beacon -24 Apr 13 09:45:14.115 WARN Snapshot cache miss for blob verification, index: 1, block_root: 0x4e25…b1a2, service: beacon -23 Apr 13 09:45:14.191 WARN Snapshot cache miss for blob verification, index: 2, block_root: 0x4e25…b1a2, service: beacon -22 Apr 13 09:45:14.249 WARN Snapshot cache miss for blob verification, index: 4, block_root: 0x4e25…b1a2, service: beacon -21 Apr 13 09:45:14.817 WARN Snapshot cache miss for blob verification, index: 5, block_root: 0x4e25…b1a2, service: beacon -20 Apr 13 09:45:14.827 WARN Snapshot cache miss for blob verification, index: 3, block_root: 0x4e25…b1a2, service: beacon 5 Apr 13 09:45:36.632 INFO Starting beacon chain method: resume, service: beacon -4 Apr 13 09:46:03.930 INFO Shutting down.. reason: Success("Received SIGTERM")

Version

v5.1.3

Present Behaviour

I never noticed.

Expected Behaviour

It seems that the memory usage of the process is high.

Steps to resolve

hwhe avatar Apr 15 '24 13:04 hwhe

Snapshot cache misses cause large spikes in memory, there is no workaround at the moment other than restarting the process.

This will be fixed once https://github.com/sigp/lighthouse/pull/5533 is merged.

michaelsproul avatar Apr 15 '24 13:04 michaelsproul

Is this reboot a protection of lighthouse itself, or did the os system kill it?

hwhe avatar Apr 16 '24 02:04 hwhe

The OS. On Linux The oom-killer kills processes that use too much memory

If you run under a supervisor (like systemd) then it will auto-restart Lighthouse with some down time. It's annoying I know, sorry. It shouldn't happen too often.

michaelsproul avatar Apr 16 '24 02:04 michaelsproul

When will a version be released to fix this problem? I'm concerned about that. @michaelsproul

hwhe avatar Apr 16 '24 09:04 hwhe

Couple of weeks

michaelsproul avatar Apr 16 '24 22:04 michaelsproul

If you can give your machine more memory, then that should also be a workaround. AFAIK our machines with 32GB RAM are not OOMing

michaelsproul avatar Apr 16 '24 22:04 michaelsproul