Child process breaks on command
Version
23.1.0
Platform
* fails on Debian 12 as a regular user with statically specified ports
* fails on Windows 10 as a regular user with statically specified ports
* does not fail Debian 12 via systemd (as an OS service run as root)
* does not fail if the ports are randomly assigned by specifying port 0 on server invocation
Subsystem
No response
What steps will reproduce the bug?
- execute a child process with command
nmap --open 127.0.0.1via instruction from any kind of network stream, such as WebSocket message or HTTP request.
const port_map = function (callback:() => void):void {
const command:string = "nmap --open 127.0.0.1";
node.child_process.exec(command, function (error:node_childProcess_ExecException, stdout:string):void {
if (callback !== null) {
callback();
}
});
};
export default port_map;
How often does it reproduce? Is there a required condition?
I can reproduce this 100% from command line in both Debian 12 and Windows 10. 0% from systemd.
What is the expected behavior? Why is that the expected behavior?
child_process exec calls a callback and child_process spawn calls event handlers in accordance with Node API definitions.
What do you see instead?
The application crashes fatally if the error event on the corresponding socket is not trapped immediately at first connection time before other instructions are executed. Assigning a handler to the socket's error event listener later is insufficient, for example assigning the handler from within the following code still results in crashes:
socket.once("data", handler);
The actual error reported by the socket is:
Error: read ECONNRESET
at TCP.onStreamRead (node:internal/stream_base_commons:216:20) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
The error is the direct result of a child process execution. The corresponding child process executes in response to instructions transmitted on the corresponding socket, but is otherwise not related or associated to the socket.
This following error is the fatal error messaging if the error is not immediately trapped on the corresponding socket.
node:events:485
throw er; // Unhandled 'error' event
^
Error: read ECONNRESET
at TCP.onStreamRead (node:internal/stream_base_commons:216:20)
Emitted 'error' event on Socket instance at:
at emitErrorNT (node:internal/streams/destroy:170:8)
at emitErrorCloseNT (node:internal/streams/destroy:129:3)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
The error messaging describes a socket failure, but it is not. It is a child process call only and only on the command specified.
Additional information
Some of the things I have tried:
- I have tried a few other trivial commands like
echo helloandps -a. They work without issue. - In exec have tried specifying different shells
/bin/shand/bin/bash - I have tried to call the child process with exec and spawn
- I have tried to call the child process in different ways: manually, recursively, setTimeout, from automated response to a socket message
- I validated that all sockets connected, both http and ws, have assigned event handlers for the error event and I have also isolated all sockets from the problematic behavior.
I simplified your reproduction to:
import { exec } from 'node:child_process';
exec("nmap --open 127.0.0.1", console.log);
And I can't reproduce:
$ node index.js
null Starting Nmap 7.94SVN ( https://nmap.org ) at 2024-11-09 20:22 EST
Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds
Seems like a problem w/ your environment, would you mind to change that IP into an exposed one? Like google's or github's IP?
I can't reproduce the issue as well.
Also ECONNRESET means: "connection reset by peer". It is not a Node.js problem, but a network problem.
Refs: https://docs.libuv.org/en/v1.x/errors.html#c.UV_ECONNRESET
It may likely be a problem with my code base. I am continuing to investigate.
I can reproduce this 100% on both Debian 12 and Windows 10. I just discovered that the problem does not occur if the application is executed from systemd.
~~Closing issue. I am attaching an empty event handler to the error event of all sockets immediate after sockets connect and that does seem to trap the error.~~
This does seem to be a valid Node issue. I have updated the issue at the top with my more recent observations.
Assigning a handler to the socket's error event listener later is insufficient, for example assigning the handler from within the following code still results in crashes:
socket.once("data", handler);
That seems to be a "data" handler, rather than an "error" handler, which probably explains the error you see.
@aduh95 I was likely not clear when I was speaking to timing.
const connection = function (tls_socket) {
// where sockets are born, the connection event handler
// for the sake of this thread socket errors must be trapped here before other functions are called, otherwise the application will break
// when evers are trapped later, even if before the breaking action, it is too late
tls_socket.on("error", function (error) {
console.log(error);
});
const handler = function (data:Buffer) {
// reason about the data and make determinations about protocol management and response messages
// ...
const socket = this;
socket.on("error", function (error) {
// errors on the socket are trapped, and those errors will not terminate the application
// except in this case its too late, and the application still reports a socket error and breaks
// Its interesting because the breaking action, a child process, does not execute until much later due to human interaction
};
};
tls_socket.once("data", handler);
};
// servers listening for connections
server.on("connection", connection); // net.server
node.tls.server.createServer({
ca: "key hash",
cert: "key hash",
key: "key hash"
}, connection); // tls.server
Looking at the code above:
- Two servers are stood up and use the same connection handler
- The connection handler is where sockets are born. If errors are not trapped there it is too late.
- All further management upon the socket occurs in:
tls_socket.once("data", handler); - Trapping errors after reasoning about the socket is too late, even if the breaking condition has not yet occurred.
I cannot further reason about the nature of that timing. I am also unclear why executing a child process would have anything to do with a network socket's read stream. I can only speculate, and this is a completely uninformed guess, that there is a stream collision between the read stream of the socket and the pipes of the child process. If so its unclear why altering the timing of error trapping would have any effect or why executing the application from a higher privilege, run as root from systemd, would have any effect.
I will be happy to continue investigating, but I suspect I am approaching the limits of what evidence I can gather from user land.
Here is the application demonstrating the issue on branch "servers": https://github.com/prettydiff/webserver/tree/servers
- clone repo && cd
- execute
npm install - execute
npm run build - execute
npm run server - The first run will generate a default server for hosting the dashboard, but on random ports. CTRL+C out of the application, change the port assignment in the dynamically created
servers.jsonfile to any ports of your choice. - execute the application again
npm run server - for some reason the problem occurs from any statically assigned port 100% of the time, but I just discovered it does not occur at all from a randomly assigned port. This means it is crashing on net socket error even when there are no sockets connected.
The error can be suppressed by uncommenting the following three lines and repeating the process above (except for npm install): https://github.com/prettydiff/webserver/blob/servers/lib/transmit/server.ts#L291-L293
The actual failure occurs from this line: https://github.com/prettydiff/webserver/blob/servers/lib/utilities/port_map.ts#L75
This logic guarantees the error will occur within 10 seconds of application start regardless of further action and no connections: https://github.com/prettydiff/webserver/blob/servers/lib/index.ts#L75-L79
This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open.
Closing this because it has stalled. Feel free to reopen if this issue/PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions.