emscripten icon indicating copy to clipboard operation
emscripten copied to clipboard

Node.js pthread implementation shutting down before pthread prints.

Open juj opened this issue 3 months ago • 33 comments

Consider the following test case:

test.c

#include <pthread.h>
#include <stdlib.h>
#include <emscripten.h>
#include <stdio.h>

void* thread_main(void* param) {
  printf("thread_main\n");
  exit(0);
}

int main(int argc, char** argv) {
  pthread_t thread;
  pthread_create(&thread, 0, thread_main, NULL);
  emscripten_exit_with_live_runtime();
}

emcc test.c -o a.html -pthread -sEXIT_RUNTIME=1


The main thread creates a pthread, and then yields out from main(), leaving the runtime alive on the main thread.

The pthread then does a print, and exits the whole program.

When run in node, this application prints nothing and quits (no thread_main printed).

When run in a browser, the printf does appear, but calling exit(0) does not shut down the runtime.

juj avatar Sep 08 '25 14:09 juj

This is certainly a bug, but its a complicated area.

The conditions underwhich we allow node to exit, or indeed the runtime to exit, are one of the most complex and annoying parts of the the runtime.

Remember that pthreads are not normally enough to keep the a process alive. The following program does not print anything on linux:

#include <pthread.h>                                                                 
#include <stdlib.h>                                                                  
#include <stdio.h>                                                                   
                                                                                     
static void* thread_main(void* param) {                                              
  printf("thread_main\n");                                                           
  while (1)  {                                                                       
    printf("looping\n");                                                             
  }                                                                                  
  return NULL;                                                                       
}                                                                                    
                                                                                     
int main(int argc, char** argv) {                                                    
  pthread_t thread;                                                                  
  pthread_create(&thread, 0, thread_main, NULL);                                     
}

So we go out of our way to mimic this behaviour in emscripten.

Note that emscripten_exit_with_live_runtime() is not enough to keep the whole node process alive, you still need something in the event loop, some kind of timer or callback that could fire, otherwise the node event loop will exit.

In other words, in order to keep the node event loop alive we currently would need some kind of JS event that could fire, be that a timer, or an async event such as a file read... IIRC threads are explicitly not included in the list of things that can keep the event loop alive in order to mimic the POSIX behaviour.

I guess we need to add even more logic here? Something like "pthreads alone don't keep the event loop alive, unless the runtime has been explictly marks as keep alive via exit_with_live_runtime or similar".

sbc100 avatar Sep 08 '25 15:09 sbc100

The code that allows node to exit with running threads is here: https://github.com/emscripten-core/emscripten/blob/80ca182d92aa33f874acd2f25718143ce8848f37/src/lib/libpthread.js#L275-L283

sbc100 avatar Sep 08 '25 15:09 sbc100

Another way of putting it: In the OP code the emscripten runtime is kept alive (i.e. we never call exitRuntime()), but there is nothing to keep the node process alive. This is a separate thing, we don't currently any any API that forces node not to exit, one does that by having something alive in the event loop.

@RReverser do you have any good ideas on how to improve the ergonomics here?

sbc100 avatar Sep 08 '25 15:09 sbc100

The following program does not print anything on linux:

Yes, but that is not the program that I wrote. Linux does not have a function emscripten_exit_with_live_runtime(); along with the particular semantics that Emscripten aims to offer with it.

juj avatar Sep 08 '25 17:09 juj

Note that emscripten_exit_with_live_runtime() is not enough to keep the whole node process alive

Maybe not today, but the point here is: what should we intend to design the proposed program to behave like? You describe several valid reasons why the program does not work today in Node.js (though it does work as designed in a browser, modulo the print about the exit()). The idea here is what we want to happen when running that code in Node?

Do we intend for that program code to be a browser-only functioning program, and it shouldn't even work in Node?

Or do we intend for that program code to be ill-defined/unspecified of what the results should be when running in node?

juj avatar Sep 08 '25 17:09 juj

As I said in my original reply, This is certainly a bug, but its a complicated area.

We can, and maybe should invest more in making node work more like the browser and/or native here. We have papered over many many, differences like this one already, and there is no reason we can't find some reasonable way to paper over this one too, but I think it will add complexity.

The initial solution I came up with was to make pthreads as strongly referenced when emscripten_runtime_keepalive_push() is called and then unref them all again then then emscripten_runtime_keepalive_pop reduces the runtime keepalive count back to zero again.

But that then begs the question: should emscripten_exit_with_live_runtime keep a node process alive even in single threaded mode? Right now emscripten_exit_with_live_runtime on node means basically "keep emscripten runtime alive as long as node is running" but it doesn't stop node from running.

sbc100 avatar Sep 08 '25 17:09 sbc100

if emscripten_exit_with_live_runtime artificially keeps the node process alive it seems like it could only lead to zombie process that do nothing (since no events could be received), but maybe we make special case for pthreads here?

sbc100 avatar Sep 08 '25 17:09 sbc100

should emscripten_exit_with_live_runtime keep a node process alive even in single threaded mode?

If there are no event handlers, then definitely it is better to quit than to deliberately zombie.

In this scenario it does seem like the key would be "have pthreads alive". For example, both

void* thread_main(void* param) {
  printf("thread_main\n");
  exit(0);
}

int main(int argc, char** argv) {
  pthread_t thread;
  pthread_create(&thread, 0, thread_main, NULL);
  emscripten_exit_with_live_runtime();
}

and

void* thread_main(void* param) {
  printf("thread_main\n");
  pthread_exit(0);
}

int main(int argc, char** argv) {
  pthread_t thread;
  pthread_create(&thread, 0, thread_main, NULL);
  emscripten_exit_with_live_runtime();
}

seem like they should be valid programs in both node.js and browser, and both should print.

The second program does have a discrepancy today with respect to quitting the runtime, as there are no mechanisms that would quit the runtime when running in a browser. Though in node.js at least, it seems that the second program would want to shut down since zero threads are left alive.

(In a browser, user might export JS functions from the Wasm module for later execution)

juj avatar Sep 08 '25 17:09 juj

I don't think this in particular is a bug tbh. It was an intended behaviour to match native platforms more closely when executing on Node.js, otherwise any background thread would (and did in the past) keep process alive indefinitely.

That's not ideal as vast majority of non-trivial multithreaded apps use threadpools that are started at the beginning and then always running in the background. Instead, an explicit pthread_join is needed to wait for the thread just like it is on native platforms.

As @sbc100 said, changing emscripten_exit_with_live_runtime to keep process alive would change its semantics from what it is today.

What I think is worth fixing for ergonomics though is:

  1. emscripten_runtime_keepalive_{push, pop} should keep the Node.js process alive, regardless of pthreads as it's not tied to them and is useful for any async operations.
  2. emscripten_proxy_callback should invoke emscripten_runtime_keepalive_{push, pop} at the beginning/end of an async operation.

In combination, these would allow to create async channels that keep Node.js process alive as long as we expect a response from a pthread. It wouldn't fix the printf case but it would fix a common real-world scenario where an async task is posted to background thread via Emscripten's proxying, and Node.js exits the process instead of waiting for it to finish.

RReverser avatar Sep 08 '25 20:09 RReverser

I don't think this in particular is a bug tbh

That is fine. How does one fix the code in the original comment, and how do we carry through documentation that the original code is not even supposed to work?

otherwise any background thread would (and did in the past) keep process alive indefinitely.

I don't think that scenario applies here? The thread is calling exit().

Should this program keep on printing forever, or quit immediately?

infinite_printer1.c

void* thread_main(void* param) {
  for(;;) {
    printf("thread_main\n");
  }
}

int main(int argc, char** argv) {
  pthread_t thread;
  pthread_create(&thread, 0, thread_main, NULL);
  emscripten_exit_with_live_runtime();
}

In browser it will currently keep on printing. In Node.js it will currently not print anything.

Do we want to say that the above program is incorrectly written for people targeting Node?

If so, what is the correct code to write in Node.js to get an infinite printer in a thread? Would it instead be the following?

infinite_printer2.c

void* thread_main(void* param) {
  for(;;) {
    printf("thread_main\n");
  }
}

int main(int argc, char** argv) {
  pthread_t thread;
  pthread_create(&thread, 0, thread_main, NULL);
  emscripten_runtime_keepalive_push();
}

Is the above code intended to not be ill-defined?

Though that variant does not work either. In Node.js it prints nothing, and in browser it will keep on printing forever.

What I think is worth fixing for ergonomics though is: emscripten_runtime_keepalive_{push, pop} should keep the Node.js process alive, regardless of pthreads as it's not tied to them and is useful for any async operations.

Is this saying that the fact that infinite_printer2.c does not work today is a bug and should be fixed, but infinite_printer1.c not working today should be considered a not a bug, and shouldn't be expected to work in node.js?

That is all fine by me, I just want to chart here what the API design idea here is. Though in my mind I would place emscripten_exit_with_live_runtime(); and emscripten_runtime_keepalive_push(); at the exact same level of support in Node.js, and there wouldn't be any reason to support both, if one is supported.

juj avatar Sep 08 '25 20:09 juj

  1. emscripten_runtime_keepalive_{push, pop} should keep the Node.js process alive, regardless of pthreads as it's not tied to them and is useful for any async operations.

If we did that then the OP example would work, and also the loop-forever example above would also work.

However, there are downsides to doing this. It would make it easy to create zombie node processes, which can never do any more work. Would we want to do that? Do you know of an easy to to do it under node?

sbc100 avatar Sep 08 '25 20:09 sbc100

Is the above code intended to not be ill-defined?

I think one problem is that node and browser and different here in that there is no concept keeping the browser event loop alive.. in the browser the event loop is always alive until the tab is closed. So emscripten_runtime_keepalive_push keeps the emscripten runtime alive on both platforms (both node and the web) but we don't have any API for keeping the event loop itself alive (which is lower level and we don't general have direct control of).

sbc100 avatar Sep 08 '25 20:09 sbc100

As @sbc100 said, changing emscripten_exit_with_live_runtime to keep process alive would change its semantics from what it is today.

What I think is worth fixing for ergonomics though is:

  1. emscripten_runtime_keepalive_{push, pop} should keep the Node.js process alive, regardless of pthreads as it's not tied to them and is useful for any async operations.

You seem to be contradicting your self here because emscripten_exit_with_live_runtime is the same as emscripten_runtime_keepalive_push + unwind so, if we define emscripten_runtime_keepalive_push as keeping node alive we would also be defining emscripten_exit_with_live_runtime as keeping node alive.

sbc100 avatar Sep 08 '25 20:09 sbc100

However, there are downsides to doing this. It would make it easy to create zombie node processes, which can never do any more work. Would we want to do that? Do you know of an easy to to do it under node?

Let's start with any which ever way to make any of the above examples to work in Node.js? How should one change the above toy examples to get a print-in-pthread working?

We can definitely solve this in documentation if we don't want Node.js users trying to use emscripten_exit_with_live_runtime() or emscripten_runtime_keepalive_push(), but surely there should be at least some way to write the above "print-in-a-pthread" programs in Node. Can we come up with modifying the above code to showcase the intended best practices of how to do that in Node.js?

I don't think I see how to achieve this in Node.js today, except to create a semaphore, and release it at the end of the pthread, and have the main thread wait for that semaphore?

juj avatar Sep 08 '25 20:09 juj

I don't think I see how to achieve this in Node.js today, except to create a semaphore, and release it at the end of the pthread, and have the main thread wait for that semaphore?

You could do literally anything to that would keep the node event loop from exiting. e.g. emscripten_set_timeout_loop(noop, 1000). Once the thread calls exit() that would shut down the runtime and stop the timeout loop.

We do have an internal API called _emscripten_thread_set_strongref which we could consider making public? This will cause the node runtime to stay alive until a specific thread is done.

sbc100 avatar Sep 08 '25 20:09 sbc100

You could do literally anything to that would keep the node event loop from exiting. e.g. emscripten_set_timeout_loop(noop, 1000).

Ah heh, I mean achieve it in a way that would fall under "this is a designed pattern to achieve this.", hacks notwithstanding.

juj avatar Sep 08 '25 20:09 juj

You could do literally anything to that would keep the node event loop from exiting. e.g. emscripten_set_timeout_loop(noop, 1000).

Ah heh, I mean achieve it in a way that would fall under "this is a designed pattern to achieve this.", hacks notwithstanding.

In that case I would say no, there is no way to have a pthread itself keep the node runtime alive. Not within our current API space. Not unless we were to expose _emscripten_thread_set_strongref so something similar.

sbc100 avatar Sep 08 '25 20:09 sbc100

Hmm, so would we consider the following code to be the well-formed best practices snippet for getting started with pthreads hello world?

void* thread_main(void* param) {
  emscripten_out("hello, world!");
  emscripten_force_exit(0); // Quit the runtime process in both Node.js and browser (exit(0) wouldn't work here))
}

int main(int argc, char** argv) {
  pthread_t thread;
  pthread_create(&thread, 0, thread_main, NULL);
  emscripten_thread_set_strongref(thread); // Stay alive for Node.js, no-op for browser
  emscripten_runtime_keepalive_push(); // Stay alive for browser, no-op for Node.js
}

juj avatar Sep 08 '25 21:09 juj

You seem to be contradicting your self here because emscripten_exit_with_live_runtime is the same as emscripten_runtime_keepalive_push + unwind so, if we define emscripten_runtime_keepalive_push as keeping node alive we would also be defining emscripten_exit_with_live_runtime as keeping node alive.

Ah hm you're right, I didn't think of one being implemented in terms of the other.

In that case scratch the first part of my contradiction, I do think that emscripten_runtime_keepalive_push should keep the event loop alive until the corresponding pop.

Node.js "zombie" processes are relatively common even w/o Emscripten, e.g. due to stray setTimeout like you've shown, or background http server, or thousands other reasons. There are tools like https://github.com/mafintosh/why-is-node-running to help debugging them.

RReverser avatar Sep 08 '25 21:09 RReverser

To be clear, I'm not advocating that zombie processes are fine in general, and definitely don't want to revert changes that got us to native platform parity by default (when no Emscripten keepalive APIs are used), but when there is an explicit Emscripten _push, I think it's reasonable and expected to treat that as any other .ref() in Node.

RReverser avatar Sep 08 '25 21:09 RReverser

OK, sounds like we can update emscripten_runtime_keepalive_push to also keep the node event loop alive then. I imagine this behaviour change will effect quite a few of our tests (but very few of our users hopefully, mostly users who run node unit tests)

sbc100 avatar Sep 08 '25 21:09 sbc100

This means that following program will never exit under node:

void main() {
  emscripten_runtime_keepalive_push();
}

A tricky edge case might be builds that are compiled with -sEXIT_RUNTIME=0 (the default)... logically we might then assume that means that the node event loop can also never exit? I would hope we can avoid that somehow.

sbc100 avatar Sep 08 '25 21:09 sbc100

Or are you saying that only extend "aliveness" to pthreads, not to the whole node event loop? That seems like a safer change for sure.

sbc100 avatar Sep 08 '25 21:09 sbc100

I'm not advocating that zombie processes are fine I imagine this behaviour change will effect quite a few of our tests

I don't want to push to propose any undesirable behavioral changes here, and I don't want to invite any bad programming patterns that would result in Emscripten code being more prone to causing zombies, but I am merely pondering for what should be designed to be the correct code example for a "hello world" print from a pthread that will work in browser and Node.js.

If supporting the above code examples is taking down a path of forcing possibly unwanted behavioral changes, then I'm totally fine in wanting to navigate around that.

The one thing about emscripten_runtime_keepalive_push(); vs emscripten_exit_with_live_runtime(); discussion though is that I don't think

int main() {
   ...
   emscripten_runtime_keepalive_push();
}

is good code, unless the author also intends to have a matching pop somewhere, and they intend to say that the code should be compatible with another actor carrying out interleaving push/pops.

If user wants to exit with a live runtime without never intending to pop (but rather they want to later imperatively call exit() to quit), I think the correct construct to write should be something that looks like

int main() {
   ...
   emscripten_exit_with_live_runtime();
}

rather than incrementing a refcount that doesn't intend to interact with the refcounting mechanism, and be decremented later.

juj avatar Sep 08 '25 21:09 juj

If user wants to exit with a live runtime without never intending to pop (but rather they want to later imperatively call exit() to quit), I think the correct construct to write should be something that looks like

int main() { ... emscripten_exit_with_live_runtime(); } rather than incrementing a refcount that doesn't intend to interact with the refcounting mechanism, and be decremented later.

I think emscripten_exit_with_live_runtime is implemented in terms of emscripten_runtime_keepalive_push mostly to save on codesize and simplify the amount of state we have to store / check.

If we implemented it separately we would need both a refcount and some other global state which could to track the emscripten_exit_with_live_runtime bit. Is there any reason not to just use emscripten_runtime_keepalive_push internally? There would be no corresponding _pop of course, but that is intended.

sbc100 avatar Sep 08 '25 21:09 sbc100

Basically emscripten_exit_with_live_runtime is a push without a pop, which means that runtime stays alive forever.

Of course somebody could calls pop explicitly, but calling pop without a push should not be defined behaviour (as you say, a programmer should only have matching pairs of pushs and pops in their code, anything else is not well defined).

sbc100 avatar Sep 08 '25 21:09 sbc100

I think emscripten_exit_with_live_runtime is implemented in terms of emscripten_runtime_keepalive_push mostly to save on codesize and simplify the amount of state we have to store / check.

Yeah I know.. I don't mean the internal implementation of these needs to change, but what the public code looks like.

We could for example use development-time ASSERTIONS checks to make sure that

void foo() {
  emscripten_runtime_keepalive_pop();
}

int main() {
  emscripten_set_timeout(foo, 1000);
  emscripten_exit_with_live_runtime();
}

would result in an error. Whatever the internal implementation detail, the intent communicated by the programmer in the above looks like a programming error that the runtime can guard against.

juj avatar Sep 08 '25 21:09 juj

Or are you saying that only extend "aliveness" to pthreads, not to the whole node event loop? That seems like a safer change for sure.

Not sure tbh. I can't think of any other source of "liveness" in Emscripten that isn't already tied to the Node.js APIs which already integrate with the event loop correctly.

Primarily I want this to work for the proxying case. Unlike other Emscripten APIs, this one works via Atomic.waitAsync, and, unlike "regular" async APIs, atomics don't keep the event loop alive which causes the most common issues.

RReverser avatar Sep 08 '25 21:09 RReverser

Primarily I want this to work for the proxying case. Unlike other Emscripten APIs, this one works via Atomic.waitAsync, and, unlike "regular" async APIs, atomics don't keep the event loop alive which causes the most common issues.

Could this also be solved by exposing emscripten_thread_set_strongref and calling that on the thread that you are proxying to? Or do you think the proxying API should effectively do that automatically?

sbc100 avatar Sep 08 '25 21:09 sbc100

Or do you think the proxying API should effectively do that automatically?

Yeah, I believe it should. Otherwise we'd also need to also add unset_strongref, users would need to replicate their own refcounting logic, and it's still too easy to misuse in a way where either the process is left hanging, or dies prematurely.

RReverser avatar Sep 08 '25 21:09 RReverser