Nelson crashes after a few hours/days
I don't run nelson as a service. I run it using the command:
nohup sudo nelson --config /Users/iota/nelson/config.ini &
Just happened for the 3rd time. I go to check my node and found out that nelson is not running. This last time it was running for almost 24 hours before crashing. More details in the log below.
Expected behaviour
Nelson running continuously
Actual behaviour
After a while, can be hours or a few days, Nelson crashes.
Steps to reproduce
Not sure. It crashes randomly
Basic Info
- Operating System: Macos High Sierra (10.13.2)
- Node (npm) Version: v8.4.0
- IRI Version: 1.4.1.6
- Nelson version: 0.3.16
Nelson Info
- Epoch: don't know
- Cycle: don't know
- Connected peers: my configuration is 6 in, 5 out
8:30:23 AM.190 16600::NODE new cycle 8:30:23 AM.195 16600::IRI Neighbors removed (if there were any): udp://31.48.190.3:14600 8:30:23 AM.195 16600::NODE connection closed 31.48.190.3:16600 (remotely dropped) 8:30:23 AM.195 16600::IRI Neighbors removed (if there were any): udp://139.226.182.140:14600 8:30:23 AM.195 16600::NODE connection closed 139.226.182.140:16600 (remotely dropped) 8:30:23 AM.196 16600::IRI Neighbors removed (if there were any): udp://31.48.190.3:14600 8:30:23 AM.196 16600::IRI Neighbors removed (if there were any): udp://139.226.182.140:14600 /usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/WebSocket.js:638 this._req.abort(); ^
TypeError: Cannot read property 'abort' of null at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/WebSocket.js:638:16) at Object.onceWrapper (events.js:314:30) at emitNone (events.js:105:13) at ClientRequest.emit (events.js:207:7) at Socket.emitTimeout (_http_client.js:722:34) at Object.onceWrapper (events.js:314:30) at emitNone (events.js:105:13) at Socket.emit (events.js:207:7) at Socket._onTimeout (net.js:398:8) at ontimeout (timers.js:469:11)
In my case it crashes several times a day. It's probably a different crash, prints the following error on console:
Error: read ECONNRESET
at _errnoException (util.js:1024:11)
at TCP.onread (net.js:615:25)
OS Linux, Nelson v.0.3.16
ECONNRESET is anther bug.
@rafaelbrizola have you closed your mac's lid in between? I think, there is a bug reported in https://github.com/websockets/ws/issues/1280 that is related to this issue.
I will check if this can be fixed/catched somehow...
Hi Roman, Thanks for having a look at this. I configured my macbook to not sleep when closing the lid on power cord. Of course, something could have happened, like a power outage, and it could have lost that configuration momentarily. I would thought that's unlikely because the error happened consistently, but let's not disconsider that possibility.
I will try catching it and, at least, cleaning up neighbors before crashing. Will need more investigation, though..
Let's work together to get the bug fixed upstream. I have a feeling that something isn't working properly in Node.js core though as per this comment.
@lpinca , I have upgraded to Node LTS 8.9.4 Let's see if this problem persists.
@rafaelbrizola , could you upgrade your node version to 8.9.4 and try the new Nelson version 0.3.21?
@romansemko Sure. Just did it. Let see how it goes
Hi, I left it running and around 36 hours later I got the error below. Nelson is still running, but doesn't update anymore.
7:03:19 AM.026 16600::NODE new cycle
7:03:19 AM.313 16600::IRI Neighbors removed (if there were any): udp://188.166.3.182:14600
7:03:19 AM.313 16600::NODE connection closed 188.166.3.182:16600 (remotely dropped)
7:03:19 AM.319 16600::IRI Neighbors removed (if there were any): udp://176.9.139.119:14600
7:03:19 AM.319 16600::NODE connection closed 176.9.139.119:16600 (remotely dropped)
7:03:24 AM.037 16600::IRI Neighbors removed (if there were any): tcp://81.169.215.74:15600
7:03:24 AM.037 16600::NODE connection closed 81.169.215.74:16600 (remotely dropped)
7:03:29 AM.045 16600::NODE connecting peer 46.5.17.47:16600
7:03:29 AM.046 16600::NODE connecting peer 104.172.104.192:16600
7:03:29 AM.046 16600::NODE connecting peer 83.135.133.111:16600
7:03:29 AM.344 16600::IRI Neighbors removed (if there were any): udp://83.135.133.111:14600
7:03:29 AM.345 16600::NODE connection closed 83.135.133.111:16600 (remotely dropped)
7:03:29 AM.365 16600::IRI Neighbors removed (if there were any): udp://46.5.17.47:14600
7:03:29 AM.365 16600::NODE connection closed 46.5.17.47:16600 (remotely dropped)
7:03:34 AM.058 16600::IRI Neighbors removed (if there were any): tcp://104.172.104.192:15600
7:03:34 AM.058 16600::NODE connection closed 104.172.104.192:16600 (remotely dropped)
7:03:39 AM.072 16600::NODE connecting peer 23.233.137.139:16600
7:03:39 AM.073 16600::NODE connecting peer 84.229.1.5:16600
7:03:39 AM.073 16600::NODE connecting peer 77.69.198.173:16600
7:03:44 AM.088 16600::IRI Neighbors removed (if there were any): udp://84.229.1.5:14600
7:03:44 AM.088 16600::NODE connection closed 84.229.1.5:16600 (remotely dropped)
7:03:44 AM.089 16600::IRI Neighbors removed (if there were any): udp://23.233.137.139:14256
7:03:44 AM.090 16600::NODE connection closed 23.233.137.139:16600 (remotely dropped)
7:03:44 AM.090 16600::IRI Neighbors removed (if there were any): udp://77.69.198.173:14777
7:03:44 AM.091 16600::NODE connection closed 77.69.198.173:16600 (remotely dropped)
7:03:49 AM.105 16600::NODE connecting peer 106.70.61.176:16600
7:03:49 AM.106 16600::NODE connecting peer 176.100.190.180:16600
7:03:49 AM.106 16600::NODE connecting peer 173.239.214.18:16600
7:03:49 AM.121 16600::IRI Neighbors removed (if there were any): udp://106.70.61.176:14600
7:03:49 AM.122 16600::NODE connection closed 106.70.61.176:16600 (remotely dropped)
7:03:49 AM.123 16600::IRI Neighbors removed (if there were any): udp://176.100.190.180:14600
7:03:49 AM.124 16600::NODE connection closed 176.100.190.180:16600 (remotely dropped)
7:03:49 AM.126 16600::IRI Neighbors removed (if there were any): udp://173.239.214.18:14600
7:03:49 AM.127 16600::NODE connection closed 173.239.214.18:16600 (remotely dropped)
7:03:54 AM.118 16600::NODE terminating...
7:03:54 AM.124 16600::NODE terminating...
7:03:54 AM.125 16600::NODE terminating...
7:03:54 AM.128 16600::IRI Neighbors removed (if there were any): udp://94.130.168.177:14600, udp://45.77.187.45:14600
7:03:54 AM.130 16600::IRI Neighbors removed (if there were any): udp://94.130.168.177:14600, udp://45.77.187.45:14600
Unhandled Rejection at: Promise Promise {
@rafaelbrizola can you try to monkey patch ws and change this to this._req = undefined;?
This will help us understand if the timeout expires after an error or after the upgrade.
Ok, done and restarted nelson. Let's see how it goes.
nelson gui also errors out for me after about a day of running. In the top left quarter of the screen, I see the following error multiple times: Unhandled Rejection at: Promise Promise
Seems Nelson is still running though, ,because the bottom half is still updated with neighbor connects and disconnects.
@rafaelbrizola any update?
Hi @lpinca sorry for not getting back to you. After that day, I still run Nelson for a couple of days but didn't notice any other crashes. However, I stopped running the experiment, and for other reason, never got back to that. So, it's inconclusive it that change helped in my case. If nobody else is noticing the crash, you can close this issue. Thanks!
Ok, thanks. I'm pretty sure the issue is not fixed and still persists on ws@4 and Node.js v8.x. Unfortunately I don't have a way to reproduce it, so it's hard to understand what happens and fix it.
This is the tracking issue in ws: https://github.com/websockets/ws/issues/1280