node-pty icon indicating copy to clipboard operation
node-pty copied to clipboard

Feature request: callback to know when the sub-process is started, of why it failed to exec

Open simark opened this issue 5 years ago • 15 comments

Environment details

  • OS: GNU/Linux
  • OS version: 16.04
  • node-pty version: master

Issue description

When using the spawn function, I didn't find any way to know when the process has been successfully started, or if the exec failed for some reason. My concrete use case is that I need to start a process on a pty. On success, I need to report to the caller that the process has been started*. On failure (e.g. if the executable doesn't exist), I should return the error code. I haven't found a way to do this using the current API.

I have implemented something in our fork of node-pty here, but unfortunately my employer doesn't want me to sign the Microsoft CLA, and thus I can't contribute it. But I thought I would share the idea, to get some feedback. And if somebody wants to implement something similar in the upstream repo, I guess you can't copy the code but you can get some good inspiration from it.

In my patch, I add an exec event to the IPty/ITerminal interface, with an optional error parameter. If the exec goes well, the exec callback is invoked without parameter. If there is an error, it is invoked with an errno string (e.g. ENOENT).

On Linux/macOS, since we use fork + exec to spawn a new process, it's a bit difficult to get some feedback on the exec (since it happens in a new process). I looked at how libuv does (what node's child_process.spawn relies on) and did something similar, which is to use a pipe between the child process and our process to report success or failure.

For Windows, it looks like we just need to catch an exception on failure, here's what I did.

Any feedback is appreciated, and if anybody is willing to work on an upstream version of this, I would be happy to help.

  • My criterion for "started" just means that the exec system call returned "success". Many things can go wrong after that, like dynamic libraries not found, but that's out of scope here.

simark avatar Jan 25 '19 18:01 simark

@simark to track a process I normally launch cmd ['/c', 'command'] on Windows otherwise sh ['-c', 'command'], that has cmd/sh launch the command immediately and returns the exit code of the sub-process. Am I not understanding the problem correctly?

Tyriar avatar Jan 26 '19 04:01 Tyriar

Hmm not really. It's not related to the process exit. It is that we want to be able to know whether the process was started successfully (meaning, the exec system call succeeded) or not. In practice, in our UI, we want to show either "Process X started" or "Couldn't start process X: no such file or directory", for example. It could be a long running process.

Checking the exit event doesn't help. First, we are not notified when the process has been successfully started (to show the success message). Then, on failure (if the program does not exist), we just receive an exit event, which we can't differentiate from a normal process exit.

For example, with node's child_process.spawn, you get an error when trying to exec something that doesn't exist:

const cp = require('child_process');
const proc = cp.spawn('./doesnotexist');
proc.on('error', err => {
	console.log('error', err);
});
proc.on('exit', code => {
	console.log('exit', code);
});
$ node test.js  
error { Error: spawn ./doesnotexist ENOENT
    at _errnoException (util.js:992:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
    at onErrorNT (internal/child_process.js:372:16)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
    at Function.Module.runMain (module.js:695:11)
    at startup (bootstrap_node.js:191:16)
    at bootstrap_node.js:612:3
  code: 'ENOENT',
  errno: 'ENOENT',
  syscall: 'spawn ./doesnotexist',
  path: './doesnotexist',
  spawnargs: [] }

with node-pty, we just get this exit event:

const cp = require('node-pty');
const proc = cp.spawn('./doesnotexist');
proc.on('exit', code => {
	console.log('exit', code);
});
$ node test.js 
exit 1

simark avatar Jan 26 '19 04:01 simark

@simark I see, definitely seems useful. Can you get approval to sign the CLA to contribute this specific change?

Tyriar avatar Jan 26 '19 19:01 Tyriar

We have already tried to get the approval for another Microsoft project (language-server stuff). IIUC, the problem our legal department has with the Microsoft CLA is that it applies to all open source Microsoft projects, which would be too wide (again, IIUC, I am not a lawyer). I doubt it will be possible to get an agreement signed just for one change...

If the situation changes, I will definitely update you.

simark avatar Jan 27 '19 01:01 simark

@simark You trigger the success event on nextTick, but it is not guaranteed that the exec call already happened in the forked process (the kernel process scheduler could have priorized the parent over the child process for some reason). A reliable success indicator for exec call is not possible without hacking deeper into the system. A cheap workaround to this might be to misuse the exit code from the child with some unlikely return value (like 127). Thats a common pattern used by shells, bash returns 127 for "command not found".

jerch avatar Feb 04 '19 18:02 jerch

@simark You trigger the success event on nextTick, but it is not guaranteed that the exec call already happened in the forked process (the kernel process scheduler could have priorized the parent over the child process for some reason).

My patch does something to avoid this (exactly the same technique as libuv). The parent synchronously waits for the child to exec, using a pipe:

https://github.com/theia-ide/node-pty/commit/f58e6ee9055554249e2788033747a1e8d139a98f#diff-32a8618c58cb849e9b5535ba19ffacc5R359

When the native function in the parent returns, the exec has either already succeeded or failed.

simark avatar Feb 04 '19 20:02 simark

Here's where it's done in libuv: https://github.com/nodejs/node/blob/3fe9267592cfef01b50f4f94eaa50b2dc495eb1a/deps/uv/src/unix/process.c#L462

simark avatar Feb 04 '19 20:02 simark

Woops have overlooked this, yeah the CLOEXEC trick works for this.

jerch avatar Feb 04 '19 20:02 jerch

Could we document this behavior on spawn() and/or in the README?

I just spent hours banging my head against a wall, and it turned out the exec had silently failed because the file path was wrong 😄

I suppose I expected an exception or some sort of feedback of the failure. Maybe we can save the next person some debugging pain with a small note?

JohnStarich avatar Jun 05 '20 04:06 JohnStarich

Even better, it should be fixed. @marcdumais-work, is the situation still the same at Ericsson (still not allowed to contribute to this project)?

simark avatar Jun 06 '20 22:06 simark

Yeah, the fact that a failed exec* gets silently swallowed by the module, is less than ideal.

@simark You already proposed a working patch for posix systems, would be nice to see this being applied. But also note, that the module is likely to see a major overhaul soon due to #405.

Not sure about the windows side here - maybe @Tyriar has some insights, whether it already promotes a failed process binary lookup?

jerch avatar Jun 07 '20 19:06 jerch

Not sure what you mean by failed process binary lookup, but afaik on Windows it acts the same. VS Code uses a workaround for this by validating cwd and executable before invoking node-pty https://github.com/microsoft/vscode/blob/cf73ba7aac021ee448a258f597064dcd0f023b87/src/vs/workbench/contrib/terminal/node/terminalProcess.ts#L74-L110

It leads to some obscure errors though which could be better if we had this.

Tyriar avatar Jun 08 '20 13:06 Tyriar

Even better, it should be fixed. @marcdumais-work, is the situation still the same at Ericsson (still not allowed to contribute to this project)?

Yes, it's still the case, as far as I know, that we are not allowed to sign the Microsoft CLA.

However we kept our node-pty fork under the MIT license, so it's my (non-lawyer) understanding that anyone could take Simon's enhancement and submit it here, so long as the project's maintainers are willing to accept this.

Looking quickly I think these are the pieces of the patch and follow-ups: https://github.com/theia-ide/node-pty/commit/f58e6ee9055554249e2788033747a1e8d139a98f https://github.com/theia-ide/node-pty/commit/186e39ce21d1cffe16393f5d9847838ea25dae41 https://github.com/theia-ide/node-pty/commit/89e077b353eaf73986e21f79cc9fef67b4eccdd3

marcdumais-work avatar Jun 08 '20 14:06 marcdumais-work

Not sure what you mean by failed process binary lookup...

Sorry for sloppy wording - I mean whether it gives a meaningful error message, if the process binary lookup failed. (you already answered it)

Imho spawn is a 2 stage thingy on OS level:

  1. open a pty fd (posix) or conpty handle (windows)
  2. run given binary as first process - exec* on posix or CreateProcessW on windows

I think both should be exposed with proper error messages, so node-pty embedding code knows whats going on - 1. can happen if the the user context has not enough privileges to call the pty subsystem or the OS hits the max pty limit (is windows limiting this as well?), 2. is the binary lookup error.

I am also not sure, if the pty fd / conpty handle is correctly teared down atm, if 2. fails (they are two different steps on OS levels prolly wasting resources if they are not teared down).

jerch avatar Jun 08 '20 17:06 jerch

Summary: I think we do throw when conpty fails to create processes now, but may not in other cases. To action this we should review what happens when processes fail to launch and ensure they throw

Tyriar avatar Oct 21 '21 19:10 Tyriar