Dynamic icon indicating copy to clipboard operation
Dynamic copied to clipboard

CLI becomes unresponsive.

Open LordDarkHelmet opened this issue 5 years ago • 11 comments

The CLI becomes unresponsive on a Dynode running on a VPS. The underlying daemon is also apparently unresponsive. This seemingly happens randomly. I have not found a coloration between the unresponsiveness and any given event. However, this only "seems" to occur on a VPS running more than one Dynode.

Steps to reproduce

  1. Run multiple Dynodes on a single VPS
  2. Wait until it becomes unresponsive. (it seems to be rare)
  3. I created a custom script to capture the log in the event that the CLI becomes unresponsive.

What platform are you using (Linux, Windows, Mac)

Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-65-generic x86_64)

Description

See attached for the debug output. Note that each "Unknown error" report gives a 250 line dump of the log file. It runs every 5 min, so a lot of it repeats from error to error. The error code "143" is caused by the "timeout --preserve-status -k 45s 40s " that is used before each CLI command to detect freezing. Please note that the line that says "CLI Unknown Error 143: Shutting Down" is a typo, I was not attempting to shut down the Dynode, I should have typed in "dumping log". dynamiccd_A.error.txt

As a side note, I also have a sequence daemon running on the same VPS. It was also unresponsive when Dynode A had the issue. The act of restarting the sequence daemon caused the dynamic CLI to become responsive and the script was then able to heal the issue.

I can improve the error reporting if there is something you want to catch. Crashes have been very rare for me, but I was able to capture this when I loaded up a single instance with multiple Dynodes. (yes each has its own IPv4 address)

LordDarkHelmet avatar Oct 27 '19 23:10 LordDarkHelmet

If this is due to running multiple instances of a Dynode on one VPS, then we do not directly support that, and won't be actively working on a fix.

Duality-CDOO avatar Oct 30 '19 17:10 Duality-CDOO

Running a local server with more than one instance on it (given that each instance has been assigned a unique IPv4 address) is something that is commonly done. I suspect that it running on a VPS vs on a local server has no bearing on the reported issue. I believe that some of our community members currently providing Dynodes hosting solutions are running into this issue. The current resolution for them is to quickly restart a non-responsive unit in order to not timeout. This band-aid relives the symptom but does not fix the issue.

What additional information can I collect that will aid in finding and fixing this issue?

LordDarkHelmet avatar Oct 30 '19 19:10 LordDarkHelmet

Are you assigning each instance with its own custom ports to use?

Duality-CDOO avatar Oct 30 '19 19:10 Duality-CDOO

U have to isolate the clients, like docker , vm , ....

SvenKercher avatar Oct 30 '19 19:10 SvenKercher

Each port is static as per the dynode rules. The RPC port is unique to each instance. Here is an example config file.

# This file was generated. YYYY-MM-DD_hh:mm:ss  Version: x.xx.xx
# Do not use special characters or spaces with username/password
rpcuser=<Randomly Generated and Unique to each file>
rpcpassword=<Randomly Generated and Unique to each file>
rpcport=<Procedural Generated and Unique to each file ex 33350, 33351, ...>
port=33300

# MINING:  These are your mining variables
# Gen can be 0 or 1. 1=mining, 0=No mining
gen=0
# genproclimit sets the number of processors you want to use -1 for unbounded (all of them)
genproclimit=-1

# DYNODE: 
# externalip is the IP address of this machine. (the remote Dynode's IP address) 
externalip=<Uniique IPv4 Address for this instance>
bind=<Uniique IPv4 Address for this instance>
# dynode can be 0 or 1. 1=dynode, 0=not a dynode
dynode=1
# Use your local or control wallet, run the command "Dynode genkey" to generate a unique dynodepairingkey for each remote Dynode
dynodepairingkey=<Procedural Generated and Unique to each file>

# if you can't connect, you may need to add a node, check the peers list on an explorer and add some nodes here
# addnode=<IP Address>:<PORT>

# End of generated file

LordDarkHelmet avatar Oct 30 '19 19:10 LordDarkHelmet

Just had a similar crash on a standalone VPS running Ubuntu 18.04 LTS. The only thing running on the VPS is a single Dynode setup with my script. This issue is rare, and it defiantly happens more often on a server running multiple Dynodes, but it just happened on a standalone dedicated VPS.

Is there anything I can do to trap information that will aid in debugging this issue?

LordDarkHelmet avatar Nov 22 '19 15:11 LordDarkHelmet

I have the same problem with Debian 10 as well , the CLI sometimes get a timeout completely on a server that only running a dynode. This server is located in a datacenter with high stability. I monitoring everything i have with zabbix, and so far they have been 0 issues regarding losses in the network.

And before version 2.4 i have been running dynodes for years without any interruption, so they is a issue for sure related in the codebase.

Since now i have over 60 dynodes running and everyone have the same issue on different types of hardware. Even if it is Ubuntu or Debian the problem still remains.

So every month i need to start the wallet check the dynode status and restart them.

I get issues with interruption mostly within 60 days when the nodes get the status "Expired", i check via getinfo or dynode status and the cli has hang as usual so that explain why it's not working either.

I using this command: /usr/local/bin/dynamic-cli getinfo

You been forced to kill the deamon because it's not responding at all.

Here is the log before the node has been killed:

2019-12-17 03:04:40 Closing Dynode connection: peer=57980, addr=2.56.116.254:33300 2019-12-17 03:04:40 Closing Dynode connection: peer=57983, addr=85.149.143.38:33300 2019-12-17 03:06:05 Closing Dynode connection: peer=57991, addr=95.217.91.76:33300 2019-12-17 03:06:05 Closing Dynode connection: peer=57992, addr=178.21.79.10:33300 2019-12-17 03:07:15 Closing Dynode connection: peer=57995, addr=89.36.218.164:33300 2019-12-17 03:07:15 Closing Dynode connection: peer=57997, addr=155.138.216.62:33300 2019-12-17 03:08:17 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode b48c02ebec2def4af12dccd3329d03388aeef64496a7addc5ee7294a6d366554-1 for addr 212.47.235.253:33300 2019-12-17 03:09:14 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode 9dc5d42fb80ca7f544641da8b36cafaed77af02496db0838da4b9ca385436ce6-1 for addr 78.141.203.66:33300 2019-12-17 03:09:16 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode 8aaa54e53c650d83837d42b1aeb1d3ac0723a991a92145325c45012cb4a73a3f-0 for addr 185.242.112.23:33300 2019-12-17 03:09:27 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode e6eed15a5fa6e661315bfed03d11b4e24ea7ba76ca6e08c4aec66d4007e288cb-22 for addr 95.217.91.20:33300 2019-12-17 03:09:33 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode f929ad8563967ad176463ff0dbbed198ee82b9d808d9bee308e488ff788ba740-0 for addr 178.21.79.25:33300 2019-12-17 03:09:48 Closing Dynode connection: peer=58003, addr=95.217.91.108:33300 2019-12-17 03:09:48 Closing Dynode connection: peer=58004, addr=217.61.97.183:33300 2019-12-17 03:10:51 socket no message in first 60 seconds, 0 1 from 58009 2019-12-17 03:10:59 Closing Dynode connection: peer=58010, addr=95.217.91.69:33300 2019-12-17 03:10:59 Closing Dynode connection: peer=58011, addr=95.217.91.88:33300 2019-12-17 03:10:59 Closing Dynode connection: peer=58012, addr=104.168.87.125:33300 2019-12-17 03:11:06 CActiveDynode::SendDynodePing -- Relaying ping, collateral=ecacee41d38b68871faae99a6032e980e336c3e59bd9d38dc1af8ecfed3ae1d8-1 2019-12-17 03:12:23 Closing Dynode connection: peer=58015, addr=95.217.91.89:33300 2019-12-17 03:13:06 peer=58020 using obsolete version 70900; disconnecting 2019-12-17 03:13:06 ProcessMessages(version, 101 bytes) FAILED peer=58020 2019-12-17 03:13:33 Closing Dynode connection: peer=58018, addr=95.217.91.88:33300 2019-12-17 03:13:33 Closing Dynode connection: peer=58019, addr=149.28.246.233:33300 2019-12-17 03:14:46 Closing Dynode connection: peer=58021, addr=51.75.16.12:33300 2019-12-17 03:14:46 Closing Dynode connection: peer=58026, addr=104.168.87.126:33300 2019-12-17 03:14:46 Closing Dynode connection: peer=58027, addr=178.21.79.35:33300 2019-12-17 03:16:02 Closing Dynode connection: peer=58030, addr=78.141.203.98:33300 2019-12-17 03:16:02 Closing Dynode connection: peer=58031, addr=95.217.91.79:33300 2019-12-17 03:16:02 Closing Dynode connection: peer=58032, addr=154.209.11.13:33300 2019-12-17 03:17:06 Closing Dynode connection: peer=58036, addr=95.217.91.34:33300 2019-12-17 03:17:06 Closing Dynode connection: peer=58037, addr=178.21.79.55:33300 2019-12-17 03:19:00 CGovernanceManager::SyncAll -- sent 0 objects and 0 votes to peer=58022 2019-12-17 03:19:25 peer=58049 using obsolete version 70900; disconnecting 2019-12-17 03:19:25 ProcessMessages(version, 101 bytes) FAILED peer=58049 2019-12-17 03:19:51 Closing Dynode connection: peer=58043, addr=51.75.16.12:33300 2019-12-17 03:19:51 Closing Dynode connection: peer=58045, addr=95.217.91.38:33300 2019-12-17 03:19:51 Closing Dynode connection: peer=58047, addr=154.209.11.29:33300 2019-12-17 03:19:51 Closing Dynode connection: peer=58048, addr=193.16.10.125:33300 2019-12-17 03:21:06 CActiveDynode::SendDynodePing -- Relaying ping, collateral=ecacee41d38b68871faae99a6032e980e336c3e59bd9d38dc1af8ecfed3ae1d8-1 2019-12-17 03:21:07 Closing Dynode connection: peer=58051, addr=95.217.91.49:33300 2019-12-17 03:21:07 Closing Dynode connection: peer=58053, addr=138.197.146.197:33300 2019-12-17 03:21:07 Closing Dynode connection: peer=58054, addr=217.61.109.246:33300 2019-12-17 03:23:37 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode e6eed15a5fa6e661315bfed03d11b4e24ea7ba76ca6e08c4aec66d4007e288cb-8 for addr 95.217.91.6:33300 2019-12-17 03:23:39 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode 150a2aceb28648997ae19cbff7c8f2afde2412ded024df895541a4d20ea63357-1 for addr 207.148.26.156:33300 2019-12-17 03:23:41 socket no message in first 60 seconds, 0 1 from 58067 2019-12-17 03:23:42 Closing Dynode connection: peer=58065, addr=51.15.51.189:33300 2019-12-17 03:23:42 Closing Dynode connection: peer=58066, addr=108.61.191.60:33300 2019-12-17 03:24:06 CDynodeBroadcast::Update -- Got UPDATED Dynode entry: addr=149.28.60.31:33300 2019-12-17 03:24:11 Misbehaving: 203.189.221.20:59828 peer=58070 (0 -> 34) 2019-12-17 03:24:11 CDynodeMan::SyncAll -- peer already asked me for the list, peer=58070 2019-12-17 03:24:23 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode 7e66f5472c91387e3e1f1469b22543f83dfe426c04f3f97cf2531bc45341ea49-0 for addr 218.85.129.63:33300 2019-12-17 03:24:50 CDynodePayments::Sync -- Sent 83 votes to peer=58070 2019-12-17 03:24:50 DYNODEPAYMENTSYNC -- Sent Dynode payment votes to peer 58070 2019-12-17 03:25:13 Closing Dynode connection: peer=58074, addr=138.197.146.197:33300 2019-12-17 03:25:56 CDynodeBroadcast::Update -- Got UPDATED Dynode entry: addr=51.77.92.5:33300 2019-12-17 03:26:23 Closing Dynode connection: peer=58080, addr=51.75.16.85:33300 2019-12-17 03:27:43 Closing Dynode connection: peer=58086, addr=5.189.148.230:33300 2019-12-17 03:27:43 Closing Dynode connection: peer=58088, addr=51.75.16.9:33300 2019-12-17 03:27:43 Closing Dynode connection: peer=58091, addr=178.21.79.48:33300 2019-12-17 03:28:38 peer=58097 using obsolete version 70600; disconnecting 2019-12-17 03:28:38 ProcessMessages(version, 101 bytes) FAILED peer=58097 2019-12-17 03:29:13 Closing Dynode connection: peer=58095, addr=95.217.91.47:33300 2019-12-17 03:30:23 Closing Dynode connection: peer=58098, addr=45.63.107.249:33300 2019-12-17 03:30:23 Closing Dynode connection: peer=58099, addr=45.76.95.82:33300 2019-12-17 03:30:23 Closing Dynode connection: peer=58100, addr=95.179.241.252:33300 2019-12-17 03:30:23 Closing Dynode connection: peer=58102, addr=159.69.182.4:33300 2019-12-17 03:31:06 CActiveDynode::SendDynodePing -- Relaying ping, collateral=ecacee41d38b68871faae99a6032e980e336c3e59bd9d38dc1af8ecfed3ae1d8-1 2019-12-17 03:31:28 Closing Dynode connection: peer=58106, addr=80.211.114.163:33300 2019-12-17 03:31:28 Closing Dynode connection: peer=58107, addr=155.138.134.12:33300 2019-12-17 03:32:47 Closing Dynode connection: peer=58113, addr=116.203.25.31:33300 2019-12-17 03:32:47 Closing Dynode connection: peer=58116, addr=192.241.194.54:33300 2019-12-17 03:32:47 Closing Dynode connection: peer=58117, addr=193.16.10.118:33300 2019-12-17 03:33:03 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode e7043605c7e273af5e8bb3d223088b9594f4e918d23e5696d045f75f8fadaf4d-1 for addr 95.217.91.63:33300 2019-12-17 03:33:11 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode 31ccba25a612056ed1b1e941af29f6b062a3c060106bcd92f3be4464e76ba352-2 for addr 45.63.12.182:33300 2019-12-17 03:34:06 Closing Dynode connection: peer=58119, addr=46.254.64.201:33300 2019-12-17 03:34:29 CDynodeMan::ProcessVerifyBroadcast -- verified Dynode 5eb80dae85ac6d8355d921ee00baec8c41c3558c87653d72a2596b8447ab6242-0 for addr 95.216.157.126:33300 2019-12-17 03:35:21 Closing Dynode connection: peer=58127, addr=144.91.76.59:33300 2019-12-17 03:36:36 Closing Dynode connection: peer=58130, addr=116.203.25.31:33300 2019-12-17 03:37:37 peer=58143 using obsolete version 70900; disconnecting 2019-12-17 03:37:37 ProcessMessages(version, 101 bytes) FAILED peer=58143 2019-12-17 03:38:05 Closing Dynode connection: peer=58142, addr=173.249.9.77:33300 2019-12-17 03:39:23 peer=58155 using obsolete version 70900; disconnecting 2019-12-17 03:39:23 ProcessMessages(version, 101 bytes) FAILED peer=58155 2019-12-17 03:39:25 Closing Dynode connection: peer=58147, addr=51.15.74.1:33300 2019-12-17 03:39:25 Closing Dynode connection: peer=58148, addr=51.15.129.216:33300 2019-12-17 03:39:25 Closing Dynode connection: peer=58149, addr=78.141.196.251:33300 2019-12-17 03:39:25 Closing Dynode connection: peer=58151, addr=116.203.150.31:33300 2019-12-17 03:40:40 Closing Dynode connection: peer=58157, addr=136.244.111.212:33300 2019-12-17 03:40:40 Closing Dynode connection: peer=58158, addr=178.21.79.11:33300 2019-12-17 03:41:06 CActiveDynode::SendDynodePing -- Relaying ping, collateral=ecacee41d38b68871faae99a6032e980e336c3e59bd9d38dc1af8ecfed3ae1d8-1 2019-12-17 03:42:04 Closing Dynode connection: peer=58161, addr=95.217.91.66:33300 2019-12-17 03:42:04 Closing Dynode connection: peer=58162, addr=173.249.5.41:33300 2019-12-17 03:42:04 Closing Dynode connection: peer=58163, addr=178.21.79.43:33300 2019-12-17 03:43:34 Closing Dynode connection: peer=58176, addr=155.138.134.12:33300 2019-12-17 03:44:44 Closing Dynode connection: peer=58179, addr=86.105.53.5:33300 2019-12-17 03:44:44 Closing Dynode connection: peer=58180, addr=95.217.91.41:33300 2019-12-17 03:45:59 Closing Dynode connection: peer=58185, addr=95.217.91.16:33300 2019-12-17 03:45:59 Closing Dynode connection: peer=58186, addr=95.217.91.37:33300 2019-12-17 03:45:59 Closing Dynode connection: peer=58187, addr=154.209.11.5:33300 2019-12-17 03:46:04 peer=58190 using obsolete version 70900; disconnecting 2019-12-17 03:46:04 ProcessMessages(version, 101 bytes) FAILED peer=58190 2019-12-17 03:46:54 peer=58197 using obsolete version 70900; disconnecting 2019-12-17 03:46:54 ProcessMessages(version, 101 bytes) FAILED peer=58197 2019-12-17 03:47:09 Closing Dynode connection: peer=58193, addr=95.217.91.83:33300 2019-12-17 03:49:23 Closing Dynode connection: peer=58204, addr=116.203.24.166:33300 2019-12-17 03:49:23 Closing Dynode connection: peer=58205, addr=178.21.79.10:33300 2019-12-17 03:49:23 Closing Dynode connection: peer=58206, addr=206.189.193.181:33300 2019-12-17 03:50:37 Closing Dynode connection: peer=58210, addr=95.217.91.45:33300 2019-12-17 03:51:07 CActiveDynode::SendDynodePing -- Relaying ping, collateral=ecacee41d38b68871faae99a6032e980e336c3e59bd9d38dc1af8ecfed3ae1d8-1 2019-12-17 03:51:57 Closing Dynode connection: peer=58213, addr=78.47.95.185:33300 2019-12-17 03:53:17 Closing Dynode connection: peer=58221, addr=95.217.91.20:33300 2019-12-17 03:53:59 peer=58231 using obsolete version 70900; disconnecting 2019-12-17 03:53:59 ProcessMessages(version, 101 bytes) FAILED peer=58231 2019-12-17 03:54:31 Closing Dynode connection: peer=58224, addr=193.16.10.136:33300 2019-12-17 03:55:04 Misbehaving: 203.189.221.20:33318 peer=58230 (0 -> 34) 2019-12-17 03:55:04 CDynodeMan::SyncAll -- peer already asked me for the list, peer=58230 2019-12-17 03:55:41 Closing Dynode connection: peer=58235, addr=95.217.50.220:33300 2019-12-17 03:55:43 peer=58238 using obsolete version 70800; disconnecting 2019-12-17 03:55:43 ProcessMessages(version, 101 bytes) FAILED peer=58238 2019-12-17 03:55:47 peer=58240 using obsolete version 70900; disconnecting 2019-12-17 03:55:47 ProcessMessages(version, 101 bytes) FAILED peer=58240 2019-12-17 03:56:56 Closing Dynode connection: peer=58239, addr=89.40.0.113:33300 2019-12-17 03:56:56 Closing Dynode connection: peer=58242, addr=154.209.11.29:33300 2019-12-17 03:56:56 Closing Dynode connection: peer=58243, addr=193.16.10.10:33300 2019-12-17 03:58:09 socket no message in first 60 seconds, 0 1 from 58251 2019-12-17 03:58:12 Closing Dynode connection: peer=58249, addr=51.75.18.248:33300 2019-12-17 03:58:12 Closing Dynode connection: peer=58250, addr=78.141.196.251:33300 2019-12-17 03:59:31 Closing Dynode connection: peer=58256, addr=95.217.91.62:33300 2019-12-17 03:59:31 Closing Dynode connection: peer=58257, addr=178.21.79.58:33300 2019-12-17 03:59:31 Closing Dynode connection: peer=58258, addr=178.21.79.59:33300 2019-12-17 07:14:42 ********** CDHTStorage -- get_mutable_item failed to get mutable entry for infohash = f49ecc1dc9e68e1d4162923cb1f067e7a6f963b6.

razerrazer avatar Dec 17 '19 18:12 razerrazer

And it has been over 1000 dynodes that has been disconnected from the network since a few days ago which could indicate that something is wrong.

razerrazer avatar Dec 17 '19 18:12 razerrazer

One of my dynamicd nodes is stuck and has an extra scheduler (dynamic-schedul) thread:

ps -e -T | grep 32254
# stuck dynamicd
32254 32254 ?        00:02:04 dynamicd
32254 32255 ?        00:45:15 dynamic-schedul
32254 32256 ?        00:00:00 dynamic-http
32254 32257 ?        00:00:00 dynamic-httpwor
32254 32258 ?        00:00:00 dynamic-httpwor
32254 32259 ?        00:00:00 dynamic-httpwor
32254 32260 ?        00:00:00 dynamic-httpwor
32254 32262 ?        00:00:00 dynamic-torcont
32254 32263 ?        01:59:51 dynamic-net
32254 32265 ?        00:01:39 dynamic-addcon
32254 32266 ?        00:00:00 dynamic-opencon
32254 32267 ?        00:00:36 dynamic-dncon
32254 32268 ?        04:38:16 dynamic-msghand
32254 32270 ?        00:00:56 dynamic-wallet
32254 32277 ?        00:00:00 dynamic-schedul ***
32254 32286 ?        00:06:04 dht-session
32254 32287 ?        00:05:31 dht-events:dht-
32254 32288 ?        00:04:59 dht-session
32254 32289 ?        00:04:19 dht-events:dht-
32254 32290 ?        00:06:17 dht-session
32254 32291 ?        00:05:17 dht-events:dht-
32254 32292 ?        00:05:34 dht-session
32254 32293 ?        00:05:42 dht-events:dht-
32254 32294 ?        00:04:59 dht-session
32254 32295 ?        00:04:29 dht-events:dht-
32254 32296 ?        00:06:03 dht-session
32254 32297 ?        00:05:50 dht-events:dht-
32254 32298 ?        00:05:38 dht-session
32254 32299 ?        00:04:46 dht-events:dht-
32254 32300 ?        00:08:26 dht-session
32254 32301 ?        00:06:51 dht-events:dht-
32254 32302 ?        00:00:00 dht-session

My working nodes do not have two dynamic-schedul threads and I can't see why it would create a second thread in the code. Anyone else see an extra dynamic-schedul thread with zero execution time when a node is stuck?

AmirAbrams avatar Feb 04 '20 19:02 AmirAbrams

We are still using boost::function and boost::bind in scheduler. Bitcoin upgraded to std::function and std::bind. I am not sure if this commit will fix the issues but I think we need it: https://github.com/duality-solutions/Dynamic/commit/962c67c43d19cc7a3b9b7bc508d7f0cfbef75db7

AmirAbrams avatar Feb 04 '20 21:02 AmirAbrams

The crash seems to be related to the data stored in the dncache.dat file. Could the extra dynamic-schedul thread be a result of a deadlock while trying to store/interact with data from the dncache.dat file?

LordDarkHelmet avatar Feb 05 '20 05:02 LordDarkHelmet