node icon indicating copy to clipboard operation
node copied to clipboard

Child process breaks on command

Open prettydiff opened this issue 1 year ago • 8 comments

Version

23.1.0

Platform

* fails on Debian 12 as a regular user with statically specified ports
* fails on Windows 10 as a regular user with statically specified ports
* does not fail Debian 12 via systemd (as an OS service run as root)
* does not fail if the ports are randomly assigned by specifying port 0 on server invocation

Subsystem

No response

What steps will reproduce the bug?

  1. execute a child process with command nmap --open 127.0.0.1 via instruction from any kind of network stream, such as WebSocket message or HTTP request.
const port_map = function (callback:() => void):void {
    const command:string = "nmap --open 127.0.0.1";
    node.child_process.exec(command, function (error:node_childProcess_ExecException, stdout:string):void {
        if (callback !== null) {
            callback();
        }
    });
};

export default port_map;

How often does it reproduce? Is there a required condition?

I can reproduce this 100% from command line in both Debian 12 and Windows 10. 0% from systemd.

What is the expected behavior? Why is that the expected behavior?

child_process exec calls a callback and child_process spawn calls event handlers in accordance with Node API definitions.

What do you see instead?

The application crashes fatally if the error event on the corresponding socket is not trapped immediately at first connection time before other instructions are executed. Assigning a handler to the socket's error event listener later is insufficient, for example assigning the handler from within the following code still results in crashes:

socket.once("data", handler);

The actual error reported by the socket is:

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:216:20) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}

The error is the direct result of a child process execution. The corresponding child process executes in response to instructions transmitted on the corresponding socket, but is otherwise not related or associated to the socket.

This following error is the fatal error messaging if the error is not immediately trapped on the corresponding socket.

node:events:485
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:216:20)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}

The error messaging describes a socket failure, but it is not. It is a child process call only and only on the command specified.

Additional information

Some of the things I have tried:

  1. I have tried a few other trivial commands like echo hello and ps -a. They work without issue.
  2. In exec have tried specifying different shells /bin/sh and /bin/bash
  3. I have tried to call the child process with exec and spawn
  4. I have tried to call the child process in different ways: manually, recursively, setTimeout, from automated response to a socket message
  5. I validated that all sockets connected, both http and ws, have assigned event handlers for the error event and I have also isolated all sockets from the problematic behavior.

prettydiff avatar Nov 09 '24 23:11 prettydiff

I simplified your reproduction to:

import { exec } from 'node:child_process';

exec("nmap --open 127.0.0.1", console.log);

And I can't reproduce:

$ node index.js
null Starting Nmap 7.94SVN ( https://nmap.org ) at 2024-11-09 20:22 EST
Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds

avivkeller avatar Nov 10 '24 01:11 avivkeller

Seems like a problem w/ your environment, would you mind to change that IP into an exposed one? Like google's or github's IP?

I can't reproduce the issue as well.

Also ECONNRESET means: "connection reset by peer". It is not a Node.js problem, but a network problem.

Refs: https://docs.libuv.org/en/v1.x/errors.html#c.UV_ECONNRESET

juanarbol avatar Nov 10 '24 01:11 juanarbol

It may likely be a problem with my code base. I am continuing to investigate.

I can reproduce this 100% on both Debian 12 and Windows 10. I just discovered that the problem does not occur if the application is executed from systemd.

prettydiff avatar Nov 10 '24 09:11 prettydiff

~~Closing issue. I am attaching an empty event handler to the error event of all sockets immediate after sockets connect and that does seem to trap the error.~~

This does seem to be a valid Node issue. I have updated the issue at the top with my more recent observations.

prettydiff avatar Nov 10 '24 09:11 prettydiff

Assigning a handler to the socket's error event listener later is insufficient, for example assigning the handler from within the following code still results in crashes:

socket.once("data", handler);

That seems to be a "data" handler, rather than an "error" handler, which probably explains the error you see.

aduh95 avatar Nov 10 '24 15:11 aduh95

@aduh95 I was likely not clear when I was speaking to timing.

const connection = function (tls_socket) {
    // where sockets are born, the connection event handler
    // for the sake of this thread socket errors must be trapped here before other functions are called, otherwise the application will break
    // when evers are trapped later, even if before the breaking action, it is too late
    tls_socket.on("error", function (error) {
        console.log(error);
    });

    const handler = function (data:Buffer) {
        // reason about the data and make determinations about protocol management and response messages
        // ...
        const socket = this;
        socket.on("error", function (error) {
            // errors on the socket are trapped, and those errors will not terminate the application
            // except in this case its too late, and the application still reports a socket error and breaks
            // Its interesting because the breaking action, a child process, does not execute until much later due to human interaction
        };
    };
    tls_socket.once("data", handler);
};

// servers listening for connections
server.on("connection", connection);  // net.server
node.tls.server.createServer({
    ca: "key hash",
    cert: "key hash",
    key: "key hash"
}, connection); // tls.server

Looking at the code above:

  1. Two servers are stood up and use the same connection handler
  2. The connection handler is where sockets are born. If errors are not trapped there it is too late.
  3. All further management upon the socket occurs in: tls_socket.once("data", handler);
  4. Trapping errors after reasoning about the socket is too late, even if the breaking condition has not yet occurred.

I cannot further reason about the nature of that timing. I am also unclear why executing a child process would have anything to do with a network socket's read stream. I can only speculate, and this is a completely uninformed guess, that there is a stream collision between the read stream of the socket and the pipes of the child process. If so its unclear why altering the timing of error trapping would have any effect or why executing the application from a higher privilege, run as root from systemd, would have any effect.

I will be happy to continue investigating, but I suspect I am approaching the limits of what evidence I can gather from user land.

prettydiff avatar Nov 10 '24 15:11 prettydiff

Here is the application demonstrating the issue on branch "servers": https://github.com/prettydiff/webserver/tree/servers

  1. clone repo && cd
  2. execute npm install
  3. execute npm run build
  4. execute npm run server
  5. The first run will generate a default server for hosting the dashboard, but on random ports. CTRL+C out of the application, change the port assignment in the dynamically created servers.json file to any ports of your choice.
  6. execute the application again npm run server
  7. for some reason the problem occurs from any statically assigned port 100% of the time, but I just discovered it does not occur at all from a randomly assigned port. This means it is crashing on net socket error even when there are no sockets connected.

The error can be suppressed by uncommenting the following three lines and repeating the process above (except for npm install): https://github.com/prettydiff/webserver/blob/servers/lib/transmit/server.ts#L291-L293

The actual failure occurs from this line: https://github.com/prettydiff/webserver/blob/servers/lib/utilities/port_map.ts#L75

This logic guarantees the error will occur within 10 seconds of application start regardless of further action and no connections: https://github.com/prettydiff/webserver/blob/servers/lib/index.ts#L75-L79

prettydiff avatar Nov 10 '24 16:11 prettydiff

This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open.

github-actions[bot] avatar Jun 15 '25 14:06 github-actions[bot]

Closing this because it has stalled. Feel free to reopen if this issue/PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions.

github-actions[bot] avatar Jul 16 '25 00:07 github-actions[bot]