lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Beacon node stops syncing after reboot

Open GeoJesse opened this issue 4 years ago • 9 comments

Description

I followed the tutorial here for staking ethereum with the exception of using a Raspberry Pi

Hardware: Raspberry Pi Model B Samsung External SSD T5 (lighthouse and geth db's stored here) OS: Ubuntu v20.04 (LTS) x64 ARM Software: Geth 1.9.19-stable-3e064192 ARM 64 Lighthouse v0.2.6-2bc9115; BLS Library: blst

Version

Lighthouse v0.2.6-2bc9115; BLS Library: blst rustc 1.45.2 (d3fb005a3 2020-07-31)

Please provide your Lighthouse and Rust version. Are you building from master, which commit? Pulled from 'master'

Present Behaviour

Whenever there is a reboot the syncing stops and gets stuck on that last slot. The services all get restarted but the beacon service never picks up where it left off and 'est-time' in the logs is '--'.

Expected Behaviour

After rebooting I would expect the beacon service to take over where it left off.

Steps to resolve

Please describe the steps required to resolve this issue, if known. The only way to get it up and running again is to stop the beacon sewrvice, purge the beacon database, then restart. However this causes it to start all over from the beginning.

GeoJesse avatar Aug 21 '20 20:08 GeoJesse

This might be related to #1488

divagant-martian avatar Aug 21 '20 20:08 divagant-martian

I'm not sure if it's related or not but it appears that it's only an issue if you haven't already fully sync'd. My beacon node just finished syncing so I tried rebooting the server again. This time it picked up with no problem. So it doesn't look like an issue for me anymore but it could be a bug for people who are trying to sync the first time.

GeoJesse avatar Aug 22 '20 01:08 GeoJesse

After Jesse showed me this, I tested it out on my own Ubuntu Desktop machine (Lighthouse v0.2.6-ebb25b55) and was able to reproduce the issue.

Here's where I stop the beacon node while it is syncing:

Aug 21 21:24:50 jon-desktop lighthouse[6885]: Aug 21 21:24:50.000 INFO Syncing                                 est_time: 1 hr 55 mins, speed: 17.98 slots/sec, distance: 125132 slots (2 weeks 3 days), peers: 2, service: slot_notifier
Aug 21 21:25:02 jon-desktop lighthouse[6885]: Aug 21 21:25:02.000 INFO Syncing                                 est_time: 1 hr 55 mins, speed: 18.00 slots/sec, distance: 124940 slots (2 weeks 3 days), peers: 2, service: slot_notifier
Aug 21 21:25:14 jon-desktop lighthouse[6885]: Aug 21 21:25:14.000 INFO Syncing                                 est_time: 1 hr 37 mins, speed: 21.33 slots/sec, distance: 124685 slots (2 weeks 3 days), peers: 4, service: slot_notifier
Aug 21 21:25:22 jon-desktop systemd[1]: Stopping Lighthouse Beacon Node...
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.463 INFO Shutting down..
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.464 INFO HTTP service shutdown                   service: http
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.466 INFO Saved DHT state                         service: network
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.466 INFO Network service shutdown                service: network
Aug 21 21:25:24 jon-desktop systemd[1]: lighthousebeacon.service: Succeeded.
Aug 21 21:25:24 jon-desktop systemd[1]: Stopped Lighthouse Beacon Node.

Here's the log after starting again:

Aug 21 21:25:49 jon-desktop systemd[1]: Started Lighthouse Beacon Node.
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 WARN Ethereum 2.0 is pre-release. This software is experimental.
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 INFO Lighthouse started                      version: Lighthouse/v0.2.6-ebb25b55
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 INFO Configured for testnet                  name: medalla
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 INFO Data directory initialised              datadir: /eth/data/lighthouse/beacon-node
Aug 21 21:25:50 jon-desktop lighthouse[6965]: Aug 21 21:25:50.323 INFO Starting beacon chain                   method: resume, service: beacon
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.353 INFO Block production enabled                method: json rpc via http, endpoint: http://127.0.0.1:8545
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.418 INFO Beacon chain initialized                head_slot: 1631, head_block: 0x3465…01f0, head_state: 0xbae2…b4f4, service: beacon
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.419 INFO Timer service started                   service: node_timer
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.420 INFO Libp2p Service                          peer_id: 16Uiu2HAm4wqR5HtP7AB86gDqRoxjeBuF9dEMs6LALBMtQAgefXPG, service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.420 INFO ENR Initialised                         tcp: Some(9000), udp: None, ip: None, id: 0xf574..a8ea, seq: 1, enr: enr:-KO4QJsGUZ8lsu4zly8XfW17mS_i-Q0SmDOLXgoMu9jy2_irZeOUboYO54DsVphNwzU_KryQXU-Z97o5ZE7V4W_UsmoBh2F0dG5ldHOIAAAAAAAAAACEZXRoMpDnp11aAAAAAf__________gmlkgnY0iXNlY3AyNTZrMaECjXNGmq3lWJ6tpKhEvKSf8cgGOHP4tIfpxfphhzpUNKODdGNwgiMo, service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.421 INFO Listening established                   address: /ip4/0.0.0.0/tcp/9000/p2p/16Uiu2HAm4wqR5HtP7AB86gDqRoxjeBuF9dEMs6LALBMtQAgefXPG, service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.422 INFO Subscribed to topics                    topics: [BeaconBlock, BeaconAggregateAndProof, VoluntaryExit, ProposerSlashing, AttesterSlashing], service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.464 INFO HTTP API started                        port: 5052, address: 127.0.0.1, service: http
Aug 21 21:26:26 jon-desktop lighthouse[6965]: Aug 21 21:26:26.001 WARN Low peer count                          peer_count: 0, service: slot_notifier
Aug 21 21:26:26 jon-desktop lighthouse[6965]: Aug 21 21:26:26.001 INFO Searching for peers                     current_slot: 126131, head_slot: 1631, finalized_epoch: 48, finalized_root: 0x952a…39bb, peers: 0, service: slot_notifier
Aug 21 21:26:37 jon-desktop lighthouse[6965]: Aug 21 21:26:37.729 INFO Sync state updated                      new_state: Syncing Finalized Chain, old_state: Stalled, service: sync
Aug 21 21:26:38 jon-desktop lighthouse[6965]: Aug 21 21:26:38.001 INFO Syncing                                 est_time: --, distance: 124501 slots (2 weeks 3 days), peers: 4, service: slot_notifier
Aug 21 21:26:47 jon-desktop lighthouse[6965]: Aug 21 21:26:47.836 WARN Batch failed to download. Dropping chain scoring peers, batch_epoch: 48, score_adjustment: Low Tolerance Error, chain: 1582718186899813585, service: sync
Aug 21 21:26:50 jon-desktop lighthouse[6965]: Aug 21 21:26:50.001 INFO Syncing                                 est_time: --, distance: 124502 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:02 jon-desktop lighthouse[6965]: Aug 21 21:27:02.002 INFO Syncing                                 est_time: --, distance: 124503 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:14 jon-desktop lighthouse[6965]: Aug 21 21:27:14.001 INFO Syncing                                 est_time: --, distance: 124504 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:26 jon-desktop lighthouse[6965]: Aug 21 21:27:26.001 INFO Syncing                                 est_time: --, distance: 124505 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:28 jon-desktop lighthouse[6965]: Aug 21 21:27:28.376 WARN Batch failed to download. Dropping chain scoring peers, batch_epoch: 48, score_adjustment: Low Tolerance Error, chain: 16513122307734969754, service: sync

Notice that after restarting the beacon, the process will remain as "Syncing" with est_time: --

The only fix we've been able to use is to clear the beacon directory and restart. Restarting after it has reached the "Synced" state works fine.

jwoyame avatar Aug 22 '20 01:08 jwoyame

Having the same issue.

Aug 24 10:20:50.000 INFO Syncing                                 est_time: --, distance: 37488 slots (5 days 4 hrs), peers: 41, service: slot_notifier
Aug 24 10:21:02.000 INFO Syncing                                 est_time: --, distance: 37489 slots (5 days 4 hrs), peers: 43, service: slot_notifier
Aug 24 10:21:14.007 INFO Syncing                                 est_time: --, distance: 37490 slots (5 days 4 hrs), peers: 43, service: slot_notifier
Aug 24 10:21:26.000 INFO Syncing                                 est_time: --, distance: 37491 slots (5 days 4 hrs), peers: 42, service: slot_notifier
Aug 24 10:21:38.001 INFO Syncing                                 est_time: --, distance: 37492 slots (5 days 4 hrs), peers: 42, service: slot_notifier
Aug 24 10:21:50.001 INFO Syncing                                 est_time: --, distance: 37493 slots (5 days 4 hrs), peers: 43, service: slot_notifier
Aug 24 10:22:02.001 INFO Syncing                                 est_time: --, distance: 37494 slots (5 days 4 hrs), peers: 41, service: slot_notifier
Aug 24 10:22:14.000 INFO Syncing                                 est_time: --, distance: 37495 slots (5 days 4 hrs), peers: 40, service: slot_notifier
Aug 24 10:22:26.000 INFO Syncing                                 est_time: --, distance: 37496 slots (5 days 4 hrs), peers: 41, service: slot_notifier
Aug 24 10:22:38.001 INFO Syncing                                 est_time: --, distance: 37497 slots (5 days 4 hrs), peers: 40, service: slot_notifier
Aug 24 10:22:50.001 INFO Syncing                                 est_time: --, distance: 37498 slots (5 days 4 hrs), peers: 37, service: slot_notifier
Aug 24 10:23:02.001 INFO Syncing                                 est_time: --, distance: 37499 slots (5 days 4 hrs), peers: 37, service: slot_notifier

q9f avatar Aug 24 '20 08:08 q9f

Me too. My DB was 105 GB in size, so I installed v0.2.7 and ran --purge-db and now I am stuck in this cycle. Update: I ran --purge-db again and for now it is working.

heueristik avatar Aug 25 '20 07:08 heueristik

I was following directions for docker on the official docs and ran into this issue.

Based on reading the above, I was able to fix it by:

  1. Wait for geth to fully sync
  2. docker-compose down
  3. mv lighthouse-data/beacon lighthouse-data/beacon_old (just to be safe)
  4. docker-compose up -d

Now syncing successfully. Have not tried/dared to stop it.

Clete2 avatar Sep 03 '20 02:09 Clete2

There is also a new sync PR coming which should address most issues related to the medalla large period of non-finality.

AgeManning avatar Sep 09 '20 01:09 AgeManning

It seems this is still happening, lighthouse getting stuck after a restart when using genesis sync

icculp avatar Apr 23 '24 17:04 icculp

@icculp do you have any logs you could send to help diagnose this?

AgeManning avatar Apr 23 '24 23:04 AgeManning