node icon indicating copy to clipboard operation
node copied to clipboard

Ability to replace current Node process with another

Open dead-claudia opened this issue 6 years ago • 21 comments

Edit: If someone can come up with a better shim for execve for Windows, that'd be far better. The form below is very expensive and very horrible.

Edit 2: Linked relevant SO question.

Edit 3: Clarify FS changes

Edit 4: Here's the text from that SO question as of July 6, 2018 (so you don't have to search for it), where I asked about how to do the Windows part.

Click to show (warning: lots of text)

So, in a feature request I filed against Node.js, I was looking for a way to replace the current Node process with another. In Linux and friends (really, any POSIX-compliant system), this is easy: use execve and friends and call it a day. But obviously, that won't work on Windows, since it only has CreateProcess (which execve and friends delegate to, complete with async behavior). And it's not like people haven't wanted to do similar, leading to numerous duplicate questions on this site. (This isn't a duplicate because it's explicitly seeking a workaround given certain constraints, not just asking for direct replacement.)

Process replacement has several facets that have to addressed:

  1. All console I/O streams have to be forwarded to the new process.
  2. All signals need transparently forwarded to the new process.
  3. The data from the old process have to be destroyed, with as many resources reclaimed as possible.
  4. All pre-existing threads and child processes should be destroyed.
  5. All pre-existing handles should be destroyed apart from open file descriptors and named pipes/etc.
  6. Optimally, the old process's memory should be kept to a minimum after the process is created.
  7. For my particular use case, retaining the process ID is not important.

And for my particular case, there are a few constraints:

  1. I can control the initial process's startup as well as the location of my "process replacement" function.
  2. I could load arbitrary native code via add-ons at potentially any stack offset.
    • Implication: I can't even dream of tracking malloc calls, handles, thread manipulation, or process manipulation to track and free them all, since DLL rewriting isn't exactly practical.
  3. I have no control over when my "process replacement" is called. It could be called through an add-on, which could've been called through either interpreted code via FFI or even another add-on recursively. It could even be called during add-on initialization.
    • Implication: I would have no ability to know what's in the stack, even if I perfectly instrumented my side. And rewriting all their calls and pushes is far from practical, and would just be all-around slow for obvious reasons.

So, here's the gist of what I was thinking: use something similar to a pseudo-trampoline.

  1. Statically allocate the following:
    1. A single pointer for the stack pointer.
    2. MAX_PATH + 1 chars for the application path + '\0'.
    3. MAX_PATH + 1 chars for the current working directory path + '\0'.
    4. 32768 chars for the arguments + '\0'.
    5. 32768 chars for the environment + '\0'.
  2. On entry, set the global stack pointer reference to the stack pointer.
  3. On "replacement":
    1. Do relevant process cleanup and lock/release everything you can.
    2. Set the stack pointer to the stored original global one.
    3. Terminate each child thread.
    4. Kill each child process.
    5. Free each open handle.
    6. If possible (i.e. not in a UWP program), For each heap, destroy it if it's not the default heap or the temporary heap (if it exists).
    7. If possible, close each open handle.
    8. If possible, walk the default heap and free each segment associated with it.
    9. Create a new process with the statically allocated file/arguments/environment/etc. with no new window created.
    10. Proxy all future received signals, exceptions, etc. without modification to this process somehow. The standard signals are easy, but not so much with the exceptions.
    11. Wait for the process to end.
    12. Return with the process's exit code.

The idea here is to use a process-based trampoline and drop the current process size to an absolute minimum while the newly created one is started.

But where I'm not very familiar with Windows, I probably made quite a few mistakes here. Also, the above seems extremely inefficient and to an extent it just feels horribly wrong for something a kernel could just release a few memory pages, deallocate a bunch of memory handles, and move some memory around for the next process.

So, to summarize, what's the ideal way to emulate process replacement on Windows with the fewest limitations?


I would like a means to "replace" the current Node process with another, keeping the same process ID. It would be something morally similar to this function, but it wouldn't return. This would be most useful for conditionally replacing Node flags in a startup script - for example, if someone wants to enable modules and your behavior needs to change non-trivially in the presence of them (like if you need to install a default loader), you'll want to respawn the process with --experimental-modules --loader <file> so you can install the loader.

This is also for scenarios when you want to run a module as a main module. If you want to do logic after the process ends, you should be using child_process.spawn regardless - you shouldn't be attempting to "replace" it in any capacity.

Here's what I propose:

  • child_process.replaceSpawn(command [ , args] [ , options ])

    • command is the path to the new command.
    • args is the args to replace the arguments with. This defaults to the empty array.
    • options is for the various options for replacing the process. This defaults to an empty object.
      • options.cwd is the new cwd to use. (Default: process.cwd())
      • options.env is the new environment to use. (Default: process.env)
      • options.argv0 is the binary to spawn as. (Default: command)
  • child_process.replaceFork(mainPath [ , args] [ , options ]) works similarly to above.

    • mainPath is the path to the new require.main.
    • options.execPath is the new binary to spawn as. (Default: process.execPath)
    • options.execArgv are the new Node flags to spawn with. (Default: process.execArgv)
    • options.argv0 is the binary to spawn as. (Default: process.argv0)
    • The command is the original binary itself.
  • Add a napi_terminating member for napi_status to represent try_catch.HasTerminated() and the result of each call after replacement termination.

  • Add a napi_set_terminate_hook(napi_env env, void (*fun)(void*), void* data) function to register a callback called on termination, to make it easier to clean up resources.

Internally, there are two cases you need to cover, and the simulated part for Windows is where it gets really hairy due to all the edge cases. Here's pseudocode for the basic algorithm (I'm not really familiar with Node internals, so take this as a rough guideline):

  1. Stop the main event loop.
  2. Go through the standard shutdown routine.
  3. Destroy any open libuv handles and cancel any remaining event loop tasks.
  4. If we're on a platform that supports process replacement (like Linux or Mac):
    1. Invoke execve or equivalent with the new process path, arguments, and environment.
  5. Else, if we're on Windows (the only supported OS that doesn't), we have to simulate it entirely:
    1. Terminate execution via v8::V8::TerminateExecution(). All N-API callbacks should return napi_terminated during this step.
    2. For each loaded native module:
      1. If the native module has a terminate hook, call it.
      2. Unload the native module's DLL.
    3. Close the event loop.
    4. Dispose the isolate.
    5. Do the rest according to whatever happens to this SO question.
    6. Else, on other OSs without a process replacement function, it'd look similar to Windows.

In addition, file system requests will have to generally create each file descriptor with O_CLOEXEC.

As for precedent where this could be used immediately:

dead-claudia avatar Jul 05 '18 03:07 dead-claudia

@isiahmeadows - great explanation, thank you. How do you compare this with:

  • embedding Node as a dll | so
  • invoke start or init with required input
  • pull it off when done
  • re-spin with new input as required ? do they solve the same problem, or different? sorry, I follow your proposal fully, but did not fully grasped the use cases.

gireeshpunathil avatar Jul 05 '18 04:07 gireeshpunathil

@gireeshpunathil In reality, it's supposed to use/emulate POSIX's execve API, so it should just overwrite/exit the process.

If you look at Liftoff's readme, that should explain best what this primarily targets: loader scripts that need zero state of their own after the process is respawned with the new arguments. If you want a cohesive idea of what using this would look like for a relaunch script, check out Babel's babel-node script, which is such a relaunch script that reloads the sibling _babel-node.js.

For a concrete example, this entire subsection would collapse to just this.

dead-claudia avatar Jul 05 '18 07:07 dead-claudia

makes sense to me! but would love to hear from @nodejs/child_process

gireeshpunathil avatar Jul 05 '18 08:07 gireeshpunathil

Just to talk about Windows for a minute, execve is supported there (actually, MS recommend _execve, but that's just a detail) and is functionally equivalent to the POSIX execve. It might work differently at the system-call level but that doesn't matter; it still works.

Note though that Windows doesn't support O_CLOEXEC. If that's an issue (and I imagine it probably is), then calling CreateProcess () (followed by ExitProcess ()) lets you specify whether the spawned process should inherit file handles open in the parent process or not (it's an all or nothing thing). Better than that you cannot do (and there are no 'hairy edge cases').

Documentation for the exec family of functions on Windows is here:

https://msdn.microsoft.com/en-us/library/431x4c1w.aspx

and you will find documentation for CreateProcess at MSDN too.

HTH

HPaulS avatar Jul 06 '18 08:07 HPaulS

@HPaulS But execve doesn't replace the process IIUC, which is why I suggest we simulate it.

dead-claudia avatar Jul 06 '18 08:07 dead-claudia

Process B starts, process A immediately quits, same difference. It's the best you can do on Windows. Simulating 'it' would be hard. I would avoid it unless there is some vital reason to do so, and I don't see one here.

Of course, I have to qualify all this by saying that I don't know what a node.js process actually is. I assume, from what you say, that it's a true process, in the sense that the underlying OS understands it.

If so (and even if not, I guess) I can see some value in having a way to say to it 'reset everything to your initial, default state'. Exactly how you might do that is another issue. Killing the process off and starting a new one is probably easiest since it would clean up all the resources currently in use and get rid of any saved state. And on Unix, yes, there's a handy 'do all that in place' system call, lucky Unix, but that, in and of itself, is not important.

I don't know enough about node.js to comment further, I feel like I'm weeding in your back garden here Isiah.


PS: Starting a new process on Windows (and then immediately exiting the old one, by whatever means) will not keep the same process ID. Is this an issue? If it is then you will indeed need to find an alternative solution which will have to be some sort of 'reset everything' call into node.js itself.

HPaulS avatar Jul 06 '18 10:07 HPaulS

@HPaulS

A Node.js process is just that, a true, heavy OS-level process, with all the baggage that surrounds one. The goal of this request is to provide a way to basically kill off this process and delegate as much as possible to that replacement process. The standard way to do it on Linux/Unix is via execve and friends, which terminates the process in-place and instantiates the replacement in-place, using the same process ID. On Windows, yes, you'd have to create a new process, but I'm proposing we also clean up and strip the parent process to as little as possible, including closing all visible active handles and child processes created by the process, so it mimics the Linux/Unix way much more closely. This would emulate the Linux/Unix method as closely as practically possible, even though Windows processes are themselves irreplaceable.

Hopefully this explains what I'm going for.

dead-claudia avatar Jul 07 '18 02:07 dead-claudia

To me, the only important question here is whether the spawned process has to have the same process ID as the parent process. Does it? Everything else is easily doable via CreateProcess() followed by ExitProcess().

Note: the exec family do not terminate (any of) the child processes of the parent process, either on Unix or on Windows. Neither, on Windows, does ExitProcess(). If you want to do that, you have to do it yourself (on both platforms).

HPaulS avatar Jul 07 '18 06:07 HPaulS

@HPaulS

Note: the exec family do not terminate (any of) the child processes of the parent process, either on Unix or on Windows.

Fair enough, so that part could be left out. (I didn't know that.)

But anyways, the main goal of this feature request is to replace this mess with something that leverages native and runtime support to be way more memory-efficient.

dead-claudia avatar Jul 07 '18 10:07 dead-claudia

OK, thanks. I myself had to check that.

So I saw this:

https://nodejs.org/api/child_process.html#child_process_child_process_exec_command_options_callback

But we seem to need another version that never returns, implementation details at the discretion of the node.js team. Don't you think?

[Edit] Pesky malfunctioning T key :(

HPaulS avatar Jul 07 '18 17:07 HPaulS

@gireeshpunathil So, have you gotten feedback from any of them yet?

dead-claudia avatar Jul 07 '18 22:07 dead-claudia

  1. I'm having some trouble understanding the use case (beyond a matter of principle: "The OS offers execve, so does Python, why doesn't Node.js?").

I would like a means to "replace" the current Node process with another, keeping the same process ID.

Why is retaining the process ID critical, other than that it imitates the behavior of execve? Does your use case include external processes that remember the replaced process's PID?

  1. Could you accomplish what you want through clever use of the VM module?

davisjam avatar Jul 10 '18 05:07 davisjam

@davisjam

Basically, I'd like to see Liftoff and friends be able to use core functionality to do their thing most efficiently, instead of relying on either a native extension or creating a second heavy process when it's not always necessary.

Why is retaining the process ID critical, other than that it imitates the behavior of execve? Does your use case include external processes that remember the replaced process's PID?

Retaining the process ID is not critical - it's just part of the execve contract. It's not necessary for what I need, and although I didn't do a good job of emphasizing that that (among a few other things like forwarding open handles) aren't critical nor necessary for my uses, it's something that doesn't really matter for about 99.9% of use cases. (If it's really necessary to preserve that, you probably shouldn't be using Node in the first place.)


Basically, what I'm wanting is the ability to "tail call" into another process, with a little bit of runtime assistance to optimize that better when it can (like in Unix). Does that help?

dead-claudia avatar Jul 11 '18 04:07 dead-claudia

Similarly, npx, nps and friends run some command as the last action they do without being interested in the results, taking up system resources for nothing.

I also would love to see something similar to execve in Node. I'd be ok with saying that it can't be implemented on windows and instead it just forks and waits.

I'd like to add that retaining the process ID is critical for these uses - shells calling nps whatever wait until the called process completes. If it forks+dies, even if it passes on the i/o streams, that would cause the shell to think the process completed.

wmertens avatar Dec 09 '19 07:12 wmertens

execve() and worker threads seems problematic though. You probably don't want a worker (or the main thread for that matter) to unilaterally decide that now is a good time to terminate the process.

Compare process.exit() - in a worker thread, that doesn't actually terminate the process, just the current thread.

bnoordhuis avatar Dec 09 '19 10:12 bnoordhuis

How about throwing if there are worker threads?

wmertens avatar Dec 09 '19 15:12 wmertens

This feature would prove useful for nodejs/corepack

arcanis avatar Nov 05 '20 09:11 arcanis

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] avatar Mar 18 '22 19:03 github-actions[bot]

Non-automated comment.

mnpenner avatar Mar 19 '22 01:03 mnpenner

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] avatar Sep 19 '22 01:09 github-actions[bot]

If no one is going to work on it (seems likely since it's been open since 2018), it's better to just let it die off peacefully. Bump comments just end up spamming people's inboxes. The project still accepts pull requests, even if the issue is closed.

bnoordhuis avatar Sep 19 '22 08:09 bnoordhuis

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] avatar Mar 21 '23 01:03 github-actions[bot]

It's been almost six years without movement so I'm putting this one out to pasture.

bnoordhuis avatar Mar 21 '23 10:03 bnoordhuis