btcd icon indicating copy to clipboard operation
btcd copied to clipboard

#1024 This small change fixes the stalling issue with the sync. In my instance the sync has not stalled for last 2 days and no restart of the process required.

Open pankajagarwal opened this issue 6 years ago • 14 comments

This change has no impact on sync speed though which is still very slow. Also, the assumption here is the service has 3-4K peer's in the cache and it's ok to disconnect from one problematic peer for the time being although the same can be added back by syncmanager and peer discovery flow.

pankajagarwal avatar Mar 09 '18 05:03 pankajagarwal

Will test this out on one of my mainnet nodes. Super excited to finally squash this bug!

Roasbeef avatar Mar 09 '18 21:03 Roasbeef

@Roasbeef how's the run progressing on your end ? it's progressing well on my side and pretty sure i can not be that lucky for so many straight days ;-)

I have a side question, are you aware of any forks which is using gpu's with btcd ?

pankajagarwal avatar Mar 12 '18 16:03 pankajagarwal

Can confirm it has been running well on my computer for 3 days now. Sync no longer stalls after less than 60 minutes. It is however still not done syncing, so I am not sure of the impact when running a full node.

esiqveland avatar Mar 18 '18 09:03 esiqveland

You mind sharing the console output and the full command you were using.

Fyi, mine is still going strong and unfortunately i haven't run into the situation you have encountered. If you can share the details, it will help me fix that scenario too.

Thanks,

Pankaj

pankajagarwal avatar Mar 19 '18 10:03 pankajagarwal

FYI, the p.Disconnect() change did not seem fix the issue, but did allow it to run longer (4 days vs 4 hours). The log messages after the sync fails are different now:

Apr 09 08:53:37 localhost btcd[4567]: 2018-04-09 08:53:37.702 [WRN] SYNC: No sync peer candidates available
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.031 [INF] SYNC: New valid peer 199.68.197.5:8333 (outbound) (/Satoshi:0.14.2/)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [INF] SRVR: Max peers reached [125] - disconnecting peer 199.68.197.5:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [INF] SYNC: Syncing to block height 517362 from peer 199.68.197.5:8333
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [INF] SYNC: Lost peer 199.68.197.5:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [WRN] SYNC: No sync peer candidates available
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.362 [INF] SYNC: New valid peer 180.107.22.201:8333 (outbound) (/Satoshi:0.15.1/)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [INF] SYNC: Syncing to block height 517362 from peer 180.107.22.201:8333
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [INF] SRVR: Max peers reached [125] - disconnecting peer 180.107.22.201:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [INF] SYNC: Lost peer 180.107.22.201:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [WRN] SYNC: No sync peer candidates available
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.074 [INF] SYNC: New valid peer 173.212.193.35:8333 (outbound) (/Satoshi:0.16.0/)
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.074 [INF] SYNC: Syncing to block height 517362 from peer 173.212.193.35:8333
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.074 [INF] SRVR: Max peers reached [125] - disconnecting peer 173.212.193.35:8333 (outbound)
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.075 [INF] SYNC: Lost peer 173.212.193.35:8333 (outbound)
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.075 [WRN] SYNC: No sync peer candidates available

nutmix avatar Apr 09 '18 12:04 nutmix

I confirm that his issue fixes the problem with stalled sync. The process now doesn't stop. The fact that it is super slow is another issue.

abitrolly avatar Apr 09 '18 16:04 abitrolly

@nutmix What block are you at currently when syncing, No sync peer candidate available means something else is wrong and possibly you are banned by other peer nodes. In a normal case, your process has a cache of 13-14K peers and the fact that you are not able to sync from any node means something is definitely wrong. Could you share the command options you are using when running the process ?

Fyi, I was able to fully sync my node from 0 to latest block without a restart on a 2 core machine in 3 weeks time without a restart. And after your node is fully synced up, the execution flow is different then what's fixed here.

pankajagarwal avatar Apr 09 '18 16:04 pankajagarwal

Still stalling for me... I'm on master

bfolkens avatar Apr 09 '18 23:04 bfolkens

@bfolkens share your log.

abitrolly avatar Apr 10 '18 04:04 abitrolly

@pankajagarwal I have the default conf file, and no command line options (just running "/home/btcd/go/bin/btcd") I have your patch applied, but it stalls after less than one day. Its at 99.20% (513405). Its currently stalled again with the same problem. The HW is dedicated linode 2 CPU 4GB ram, Ubuntu 16, up to date with FW enabled.. Nothing else running on it except btcd). It has been syncing for 3 months now, I have to restart it 1-2 times a day.

nutmix avatar Apr 10 '18 12:04 nutmix

There might be different issues with network connection - firewalls can affect this. Logs at max verbosity may help, modifying code to give more info about peer quality/status may help too.

abitrolly avatar Apr 22 '18 02:04 abitrolly

Currently trying with: btcd --txindex

update: Not working ;(

screen shot 2018-08-04 at 18 59 59

MasterNeuron avatar Aug 04 '18 16:08 MasterNeuron

@jcvernaleo (as per #1530)

  • High priority
  • Bug (if true, as per @Roasbeef this doesn't actually look like a bug and can be closed)

jakesylvestre avatar Mar 04 '20 14:03 jakesylvestre

@Roasbeef did you get a chance to test this out? If not I can go ahead and do it

jakesylvestre avatar Mar 17 '20 14:03 jakesylvestre