lighthouse
lighthouse copied to clipboard
the lighthouse process is killed
Description
I have a 10 vm's all running lighthouse processes on them. At some point, these process are all killed. they have same logs Apr 13 09:45:05.000 INFO Synced slot: 8848123, block: 0xdb27…2a07, epoch: 276503, finalized_epoch: 276501, finalized_root: 0x16a0…2573, exec_hash: 0x89b0…6fc8 (verified), peers: 109, service: slot_notifier -25 Apr 13 09:45:14.109 WARN Snapshot cache miss for blob verification, index: 0, block_root: 0x4e25…b1a2, service: beacon -24 Apr 13 09:45:14.115 WARN Snapshot cache miss for blob verification, index: 1, block_root: 0x4e25…b1a2, service: beacon -23 Apr 13 09:45:14.191 WARN Snapshot cache miss for blob verification, index: 2, block_root: 0x4e25…b1a2, service: beacon -22 Apr 13 09:45:14.249 WARN Snapshot cache miss for blob verification, index: 4, block_root: 0x4e25…b1a2, service: beacon -21 Apr 13 09:45:14.817 WARN Snapshot cache miss for blob verification, index: 5, block_root: 0x4e25…b1a2, service: beacon -20 Apr 13 09:45:14.827 WARN Snapshot cache miss for blob verification, index: 3, block_root: 0x4e25…b1a2, service: beacon 5 Apr 13 09:45:36.632 INFO Starting beacon chain method: resume, service: beacon -4 Apr 13 09:46:03.930 INFO Shutting down.. reason: Success("Received SIGTERM")
Version
v5.1.3
Present Behaviour
I never noticed.
Expected Behaviour
It seems that the memory usage of the process is high.
Steps to resolve
Snapshot cache misses cause large spikes in memory, there is no workaround at the moment other than restarting the process.
This will be fixed once https://github.com/sigp/lighthouse/pull/5533 is merged.
Is this reboot a protection of lighthouse itself, or did the os system kill it?
The OS. On Linux The oom-killer kills processes that use too much memory
If you run under a supervisor (like systemd) then it will auto-restart Lighthouse with some down time. It's annoying I know, sorry. It shouldn't happen too often.
When will a version be released to fix this problem? I'm concerned about that. @michaelsproul
Couple of weeks
If you can give your machine more memory, then that should also be a workaround. AFAIK our machines with 32GB RAM are not OOMing