node icon indicating copy to clipboard operation
node copied to clipboard

node-api: Add API to check safety of calling JS

Open srriddle opened this issue 1 year ago • 4 comments

This PR seeks to resolve an race condition issue while shutting down node with neon bindings. Due to the race, it is possible to call into JavaScript during shutdown while it is not allowed. The added API cal exposes the can_call_into_js so that we are able to check and catch the condition where we are no longer allowed to call into JavaScript.

https://github.com/neon-bindings/neon/pull/913

srriddle avatar Jul 12 '22 18:07 srriddle

Review requested:

  • [ ] @nodejs/node-api

nodejs-github-bot avatar Jul 12 '22 18:07 nodejs-github-bot

It is not currently possible with APIs to distinguish between a pending exception and not being able to call into js for some other reason. This can be worked around by abusing NAPI_PREAMBLE in some existing Node-API method, but ideally there would be a method that can be more cleanly called.

kjvalencik avatar Jul 12 '22 18:07 kjvalencik

We discussed in the Node-API team meeting today. Could you expand on the scenario on when you are runing into this problem so that we can better understand? For example what do you do in the case where it will return false?

mhdawson avatar Aug 05 '22 15:08 mhdawson

@mhdawson The most common use case is to stop calling JavaScript instead of attempting to handle a JavaScript exception.

Consider the first call in a threadsafe function. It fails with napi_exception. Did it throw an exception or is it not possible to call into JS at all? If it's not possible to call at all, most of the time, I just want to exit early.

Neon currently uses a side effect of napi_throw with a bad argument to detect this state. Since it's internal to Neon, it knows it's not possible to be in a throwing state yet.

https://github.com/neon-bindings/neon/blob/c6f75a774a5320412d373f46e381287c3a8e701e/crates/neon/src/sys/no_panic.rs#L124-L135

kjvalencik avatar Aug 05 '22 16:08 kjvalencik

For finalizers called at environment shutdown, I don't think it is encouraged to call into JavaScript in the finalizers, and we are working to eagerly invoke the finalizers at a time that the JavaScript execution is disallowed: https://github.com/nodejs/node/pull/42651#issuecomment-1209616645.

For napi_threadsafe_function, I think we should stop draining the queue of the threadsafe function at the termination of an environment, and wait for its finalizer to be invoked and user's data to be cleaned up, similarly, with JavaScript execution disallowed in the finalizers.

However, even with the work of #42651, we can not get rid of what is already carved in the stone as node-api is declared to be abi-stable. I believe it is worth landing this new API as experimental and re-initiate the effort on the issue https://github.com/nodejs/node/issues/37446. As a requirement for landing a new node-api, I'll bring this topic up in this week's node-api working group meeting.

Thank you for working on this!

legendecas avatar Aug 29 '22 08:08 legendecas

There's another problem. @addaleax correct me if I'm wrong, but AFAICT can_call_into_js() can become false at any time, because, on a worker, turning off the ability to call into JS is a matter of the main thread setting an atomic boolean in the worker's node::Environment. https://github.com/gabrielschulhof/tsfn-test/blob/main/README.md shows an example of can_call_into_js() becoming false while the worker thread is already inside napi_call_function(). So, having a separate check doesn't really prevent such race conditions. I think the lesson is that any Node-API can fail with napi_pending_exception at any time.

One way I found to distinguish an actual exception from an env teardown is that in the case of a teardown, the call returns napi_pending_exception but a call to napi_is_exception_pending() returns napi_ok and false. After https://github.com/nodejs/node/pull/45715 lands we can add the napi_cannot_call_into_js status to make it clearer why the call has failed.

gabrielschulhof avatar Apr 08 '23 16:04 gabrielschulhof

@addaleax would it make sense to protect the boolean flag with a mutex instead of making it atomic? This would prevent the main thread from setting it while there's a node api call going on. So, like,

  1. CallIntoModule try-locks the mutex. If it succeeds, it calls into the module and releases the lock, otherwise it returns.
  2. On the main thread, the native binding for terminate() locks the mutex, sets the boolean, releases the mutex, and returns.

This would not address calls into module that do not execute JS. Providing APIs that allow one to attach these "pure" callbacks is a separate discussion though.

WDYT?

gabrielschulhof avatar Apr 08 '23 17:04 gabrielschulhof