node-red-nodes icon indicating copy to clipboard operation
node-red-nodes copied to clipboard

[Daemon node] write can throw and is not caught, crashes NR

Open tve opened this issue 3 years ago • 12 comments

Which node are you reporting an issue on?

Daemon

What are the steps to reproduce?

Unknown. Happens sporadically due to unreliable wifi links.

What happens?

Node-RED crashes

15 Nov 10:45:00 - [warn] [daemon:ssh] Restarting : ssh                                             
15 Nov 10:45:01 - [red] Uncaught Exception:                                                        
15 Nov 10:45:01 - [error] Error: write EPIPE                                                       
    at afterWriteDispatched (node:internal/stream_base_commons:160:15)                             
    at writeGeneric (node:internal/stream_base_commons:151:3)                                      
    at Socket._writeGeneric (node:net:874:11)                                                      
    at Socket._write (node:net:886:8)                                                              
    at writeOrBuffer (node:internal/streams/writable:392:12)                                       
    at _write (node:internal/streams/writable:333:10)                                              
    at Writable.write (node:internal/streams/writable:337:10)                                      
    at DaemonNode.inputlistener [as _inputCallback] (/data/node_modules/node-red-node-daemon/daemon
.js:44:81)                                                                                         
    at /usr/src/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:210:26                   
    at Object.trigger (/usr/src/node-red/node_modules/@node-red/util/lib/hooks.js:166:13)          

As of a recent commit Line 44 in daemon.js is now: https://github.com/node-red/node-red-nodes/blob/master/utility/daemon/daemon.js#L53

What do you expect to happen?

I expect the daemon node to catch the exception and handle it gracefully without taking all of NR down.

Please tell us about your environment:

15 Nov 10:45:16 - [info] Node-RED version: v3.0.2                                                  
15 Nov 10:45:16 - [info] Node.js  version: v18.7.0                                                 
15 Nov 10:45:16 - [info] Linux 5.15.0-46-generic x64 LE                                            

tve avatar Nov 15 '22 18:11 tve

Just to attempt to make this clearer.

If I've understood correctly you are using the daemon node to run an ssh session to a remote host and the wifi outage is causing this to exit.

It looks like the node is trying to write to the now none existent stdin for the dead ssh process.

hardillb avatar Nov 15 '22 19:11 hardillb

@tve Could you try adding a try catch around the write in that if statement line ? ( as you are best placed to recreate the error. )

dceejay avatar Nov 15 '22 23:11 dceejay

having looked at the code I must say I'm slightly baffled - as soon as we spawn the command we set up an error handler (Line 117) - so for some reason that isn't getting called... more eyes on please.

edit - Aha - found this https://stackoverflow.com/questions/67296866/node-js-an-uncatchable-error-is-thrown-when-the-child-process-is-abruptly-close

so I'll add an error handler specifically to the stdin...

dceejay avatar Nov 16 '22 11:11 dceejay

@tve - published version 0.5.1 for you to try

dceejay avatar Nov 16 '22 12:11 dceejay

Thanks, will try it out!

tve avatar Nov 16 '22 15:11 tve

@tve - any feedback ? OK to close ?

dceejay avatar Nov 18 '22 09:11 dceejay

Sorry, 'been slow... I just rebuilt the NR container and relaunched. It was crashing at least once a day, so should know soon!

tve avatar Nov 29 '22 06:11 tve

No issue so far, I'll close the ticket, can always reopen if necessary. Thanks for the fabulous turn-around time!!

tve avatar Nov 30 '22 17:11 tve

Hmmm, I don't know whether this is truly related, but it looks suspiciously so. Same system crashed NR on an EPIPE, apparently inside node.js itself:

3 Dec 09:25:07 - [red] Uncaught Exception:                              
3 Dec 09:25:07 - [error] Error: write EPIPE                             
    at WriteWrap.onWriteComplete [as oncomplete] (node:internal/stream_base_commons:94:16)                                                      
***** NODE-RED STARTING ***** Sat Dec 3 09:25:16 PST 2022               

The previous log message was unrelated and 30 seconds prior. The immediate NR restart is by docker. This happened ~5 hours prior as well (no additional info in the log), but didn't happen in the previous 4 days. Not sure how to troubleshoot this...

tve avatar Dec 03 '22 17:12 tve

if it throws an error asynchronously then the recommended thing to do is restart whch is what we do. As this is node-internal I'm not sure what we can do about it.

dceejay avatar Dec 03 '22 20:12 dceejay

Do you have suggestions for how to troubleshoot this? NR crashed 8 times yesterday due to this and 3 times today. I have to find some solution... I'm running node 18.7.0, upgrading to 18.12.1 now...

tve avatar Dec 05 '22 03:12 tve

Generally an EPIPE error means that the other end closed the connection unexpectedly... so I would look there.

dceejay avatar Dec 05 '22 11:12 dceejay