btcd
btcd copied to clipboard
#1024 This small change fixes the stalling issue with the sync. In my instance the sync has not stalled for last 2 days and no restart of the process required.
This change has no impact on sync speed though which is still very slow. Also, the assumption here is the service has 3-4K peer's in the cache and it's ok to disconnect from one problematic peer for the time being although the same can be added back by syncmanager and peer discovery flow.
Will test this out on one of my mainnet nodes. Super excited to finally squash this bug!
@Roasbeef how's the run progressing on your end ? it's progressing well on my side and pretty sure i can not be that lucky for so many straight days ;-)
I have a side question, are you aware of any forks which is using gpu's with btcd ?
Can confirm it has been running well on my computer for 3 days now. Sync no longer stalls after less than 60 minutes. It is however still not done syncing, so I am not sure of the impact when running a full node.
You mind sharing the console output and the full command you were using.
Fyi, mine is still going strong and unfortunately i haven't run into the situation you have encountered. If you can share the details, it will help me fix that scenario too.
Thanks,
Pankaj
FYI, the p.Disconnect() change did not seem fix the issue, but did allow it to run longer (4 days vs 4 hours). The log messages after the sync fails are different now:
Apr 09 08:53:37 localhost btcd[4567]: 2018-04-09 08:53:37.702 [WRN] SYNC: No sync peer candidates available
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.031 [INF] SYNC: New valid peer 199.68.197.5:8333 (outbound) (/Satoshi:0.14.2/)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [INF] SRVR: Max peers reached [125] - disconnecting peer 199.68.197.5:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [INF] SYNC: Syncing to block height 517362 from peer 199.68.197.5:8333
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [INF] SYNC: Lost peer 199.68.197.5:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.032 [WRN] SYNC: No sync peer candidates available
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.362 [INF] SYNC: New valid peer 180.107.22.201:8333 (outbound) (/Satoshi:0.15.1/)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [INF] SYNC: Syncing to block height 517362 from peer 180.107.22.201:8333
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [INF] SRVR: Max peers reached [125] - disconnecting peer 180.107.22.201:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [INF] SYNC: Lost peer 180.107.22.201:8333 (outbound)
Apr 09 08:53:38 localhost btcd[4567]: 2018-04-09 08:53:38.363 [WRN] SYNC: No sync peer candidates available
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.074 [INF] SYNC: New valid peer 173.212.193.35:8333 (outbound) (/Satoshi:0.16.0/)
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.074 [INF] SYNC: Syncing to block height 517362 from peer 173.212.193.35:8333
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.074 [INF] SRVR: Max peers reached [125] - disconnecting peer 173.212.193.35:8333 (outbound)
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.075 [INF] SYNC: Lost peer 173.212.193.35:8333 (outbound)
Apr 09 08:53:46 localhost btcd[4567]: 2018-04-09 08:53:46.075 [WRN] SYNC: No sync peer candidates available
I confirm that his issue fixes the problem with stalled sync. The process now doesn't stop. The fact that it is super slow is another issue.
@nutmix What block are you at currently when syncing, No sync peer candidate available means something else is wrong and possibly you are banned by other peer nodes. In a normal case, your process has a cache of 13-14K peers and the fact that you are not able to sync from any node means something is definitely wrong. Could you share the command options you are using when running the process ?
Fyi, I was able to fully sync my node from 0 to latest block without a restart on a 2 core machine in 3 weeks time without a restart. And after your node is fully synced up, the execution flow is different then what's fixed here.
Still stalling for me... I'm on master
@bfolkens share your log.
@pankajagarwal I have the default conf file, and no command line options (just running "/home/btcd/go/bin/btcd") I have your patch applied, but it stalls after less than one day. Its at 99.20% (513405). Its currently stalled again with the same problem. The HW is dedicated linode 2 CPU 4GB ram, Ubuntu 16, up to date with FW enabled.. Nothing else running on it except btcd). It has been syncing for 3 months now, I have to restart it 1-2 times a day.
There might be different issues with network connection - firewalls can affect this. Logs at max verbosity may help, modifying code to give more info about peer quality/status may help too.
Currently trying with: btcd --txindex
update: Not working ;(
data:image/s3,"s3://crabby-images/4ca01/4ca01ee900e9c253fbd3bf099aec098621877fcc" alt="screen shot 2018-08-04 at 18 59 59"
@jcvernaleo (as per #1530)
- High priority
- Bug (if true, as per @Roasbeef this doesn't actually look like a bug and can be closed)
@Roasbeef did you get a chance to test this out? If not I can go ahead and do it