node-addon-api
node-addon-api copied to clipboard
(Typed)ThreadSafeFunction.Release() crashes renderer process on reload (& abort?)
We are using a native addon with Electron (Both v14 and v16 show the same behaviour)
There is a class, which roughly looks like this:
class Foo : public Napi::ObjectWrap<Foo> {
public:
Foo(
const Napi::CallbackInfo& info
):
Napi::ObjectWrap<Foo>(info)
{
tsfn1_ = ...
tsfn2_ = ...
}
~Foo() {
tsfn1_.Release();
tsfn2_.Release();
...
}
private:
TSFNT1 tsfn1_;
TSFNT2 tsfn2_;
...
}
Everything works fine - Except when Electron reloads.
There are some situation when we need to reload the Electron application via getCurrentWebContents().reloadIgnoringCache()
- Which fails spectacularly with a renderer crash.
This is an issue both for our users AND when in development when using hot reload.
I have tracked down the issue to the .Release()
calls
If I remove the .Release()
calls, everything seems to work fine, including reload.
I have tried to debug the issue further with a debug build of Electron, but at least v14 (have not tested v16) triggers another assertion when reloading which can't be fixed (I stripped the addon to not contain any exports and just the register call and it still caused a renderer crash on reload with Debug Electron)
My theory: Node destroys the TSFNs on Node's side before it frees / destroys the objects of the addon. I can't say whether this is just a wild theory or a real explanation, but I think a couple of Node devs here could know?
Our fix for now: Don't free the TSNFs in the destructor. The Foo
object is long-living and usually only created once and destroyed once in the lifecycle of the addon.
FWIW, that's the stacktrace:
thread #1, name = 'electron', stop reason = signal SIGABRT
* frame #0: 0x00007f70aad7218b libc.so.6`__GI_raise(sig=<unavailable>) at raise.c:51:1
frame #1: 0x00007f70aad51859 libc.so.6`__GI_abort at abort.c:79:7
frame #2: 0x00005565c007d794 electron`uv_mutex_lock + 20
frame #3: 0x00005565c63f7266 electron`napi_release_threadsafe_function + 38
frame #4: 0x00007f709c498c9a addon.node`Napi::TypedThreadSafeFunction<std::nullptr_t, std::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<SecretType, std::default_delete<SecretType> > >, &(Foo::SecretJSGlueFunction(Napi::Env, Napi::Function, std::nullptr_t*, std::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<SecretType, std::default_delete<SecretType> > >*))>::Release(this=0x000013900794e100) at napi-inl.h:5163:10
frame #5: 0x00007f709c4905d8 addon.node`Foo::~Foo(this=0x000013900794e000) at Foo.cc:166:25
frame #6: 0x00007f709c490839 addon.node`Foo::~Foo(this=0x000013900794e000) at Foo.cc:164:48
frame #7: 0x00007f709c4a2632 addon.node`Napi::ObjectWrap<Foo>::FinalizeCallback(env=0x000013900078ae20, data=0x000013900794e000, (null)=0x0000000000000000) at napi-inl.h:4387:3
frame #8: 0x00005565c63f765f electron
frame #9: 0x00005565c63f8adf electron
frame #10: 0x00005565c63f74a6 electron
frame #11: 0x00005565c63f7ea5 electron
frame #12: 0x00005565c63f7f0e electron
frame #13: 0x00005565c63f8218 electron
frame #14: 0x00005565c0072729 electron`uv_run + 521
frame #15: 0x00005565c63d00e1 electron
frame #16: 0x00005565c63d0638 electron
frame #17: 0x00005565c6398264 electron`node::FreeEnvironment(node::Environment*) + 180
frame #18: 0x00005565c01d6698 electron
frame #19: 0x00005565c56662d5 electron
frame #20: 0x00005565c43085a5 electron
frame #21: 0x00005565c43072a6 electron
frame #22: 0x00005565c4684c0f electron
frame #23: 0x00005565c4ba1468 electron
frame #24: 0x00005565c4ba24f4 electron
frame #25: 0x00005565c4bb69e2 electron
frame #26: 0x00005565c4bb97bd electron
frame #27: 0x00005565c470b503 electron
frame #28: 0x00005565c5657761 electron
frame #29: 0x00005565c5671db3 electron
frame #30: 0x00005565c5671bfb electron
frame #31: 0x00005565c56561cc electron
frame #32: 0x00005565c5ae4805 electron
frame #33: 0x00005565c0f493c9 electron
frame #34: 0x00005565c5ae4d2b electron
frame #35: 0x00005565c2fa29e8 electron
frame #36: 0x00005565c2fa771d electron
frame #37: 0x00005565c2fa3f4a electron
frame #38: 0x00005565c31c0b29 electron
frame #39: 0x00005565c2fa505f electron
frame #40: 0x00005565c2c6e2f5 electron
frame #41: 0x00005565c2c84316 electron
frame #42: 0x00005565c2c854af electron
frame #43: 0x00005565c2c3901e electron
frame #44: 0x00005565c2c85914 electron
frame #45: 0x00005565c2c575f2 electron
frame #46: 0x00005565c61e682b electron
frame #47: 0x00005565c104fe53 electron
frame #48: 0x00005565c104d77f electron
frame #49: 0x00005565c104e169 electron
frame #50: 0x00005565c008335c electron
frame #51: 0x00007f70aad530b3 libc.so.6`__libc_start_main(main=0x00005565c0083150, argc=15, argv=0x00007fff22fedd28, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff22fedd18) at libc-start.c:308:16
frame #52: 0x00005565bfd18daa electron`_start + 42
Electron versions: v14 & v16 Compiler: Clang 10+ OS: definitely Linux, but seems to happen on MacOS and Windows Arch: AMD64, but seems to happen on ARM64 too
Any ideas?
@robinchrist if you remove the Release()
, and since it now works, do you get memory leaks as you reload electron many times?
@gabrielschulhof I am debugging what seems to be the same issue and I did asan
it and here is what happens:
- There is some class which contains a TSF
- At some point its destructor calls
TSF::Release()
before deleting the object that contains the TSF - This triggers a call to
details::ThreadSafeFinalize::FinalizeWrapperWithContext
at some later moment - This function fails badly because the TSF object has already been deleted
If I remove the call to Release()
- yes, then it works, but my Node process is left hanging at the end since there are open file handles
Currently the only way to make this work is to always have a finalizer and defer the deleting of the containing object. It is quite cumbersome, isn't it possible to skip FinalizeWrapperWithContext
when there is no finalizer?
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.
@gabrielschulhof What would be the best way to test this?
I have the same problem on mac, but I can't trigger the bug stably.
- electron: 15.1.0
- node-addon-api: 4.2.0
- macos: 12.3 x64
I wrote something like this. See the full version here.
MDQuery::~MDQuery() {
if (_queryRef) {
auto localCenter = CFNotificationCenterGetLocalCenter();
CFNotificationCenterRemoveObserver(localCenter, this, kMDQueryDidFinishNotification, _queryRef);
CFNotificationCenterRemoveObserver(localCenter, this, kMDQueryDidUpdateNotification, _queryRef);
}
this->StopQuery();
if (_queryRef) {
CFRelease(_queryRef);
_queryRef = NULL;
}
if (_queryResultCallback) {
_queryResultCallback.Release();
_queryResultCallback = NULL;
}
if (_updateCallback) {
_updateCallback.Release();
_updateCallback = NULL;
}
}
Here is my crash log.
Crashed Thread: 0 CrBrowserMain Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
dyld: in dlopen_preflight()
abort() called
Thread 0 Crashed:: CrBrowserMain Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff70ec449a __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff70f816cb pthread_kill + 384
2 libsystem_c.dylib 0x00007fff70e4ca1c abort + 120
3 com.github.Electron.framework 0x0000000103dd68c4 uv_mutex_lock + 20
4 com.github.Electron.framework 0x0000000107439696 napi_release_threadsafe_function + 38
5 node.napi.node 0x000000011037d67e MDQuery::~MDQuery() + 174
6 node.napi.node 0x000000011037d6de MDQuery::~MDQuery() + 14
7 node.napi.node 0x000000011037f287 Napi::ObjectWrap<MDQuery>::FinalizeCallback(napi_env__*, void*, void*) + 55
8 com.github.Electron.framework 0x0000000107439a9f node_api_get_module_file_name + 767
9 com.github.Electron.framework 0x00000001074218eb node::EmitAsyncDestroy(node::Environment*, node::async_context) + 382219
10 com.github.Electron.framework 0x00000001074398d6 node_api_get_module_file_name + 310
11 com.github.Electron.framework 0x000000010743a465 node_api_get_module_file_name + 3269
12 com.github.Electron.framework 0x000000010743a4ce node_api_get_module_file_name + 3374
13 com.github.Electron.framework 0x000000010743a7db node_api_get_module_file_name + 4155
14 com.github.Electron.framework 0x0000000103dca977 uv_run + 535
15 com.github.Electron.framework 0x000000010740f2a4 node::EmitAsyncDestroy(node::Environment*, node::async_context) + 306884
16 com.github.Electron.framework 0x000000010740f745 node::EmitAsyncDestroy(node::Environment*, node::async_context) + 308069
17 com.github.Electron.framework 0x00000001073c1244 node::FreeEnvironment(node::Environment*) + 164
18 com.github.Electron.framework 0x0000000103ebca95 ElectronInitializeICUandStartNode + 918277
19 com.github.Electron.framework 0x0000000103eab3d9 ElectronInitializeICUandStartNode + 846921
20 com.github.Electron.framework 0x0000000105231e12 v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 3294482
21 com.github.Electron.framework 0x000000010523341a v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 3300122
22 com.github.Electron.framework 0x000000010522f298 v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 3283352
23 com.github.Electron.framework 0x0000000104600ab3 electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6674531
24 com.github.Electron.framework 0x000000010460054b electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6673147
25 com.github.Electron.framework 0x00000001045fef7b electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6667563
26 com.github.Electron.framework 0x00000001045ff819 electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6669769
27 com.github.Electron.framework 0x0000000103ddc756 ElectronMain + 134
28 org.tencent.xiaowei-desktop 0x0000000103d655e6 0x103d64000 + 5606
29 libdyld.dylib 0x00007fff70d752e5 start + 1
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.
Hi @robinchrist / @mmomtchev / @BB-fat ,
We discussed this in the 1 July Node API meeting. Do you have a repository that reproduces this issue? Since this is using Electron, it is a little cumbersome to try to create a repro on our own.
Thanks, Kevin
@KevinEady I don't have a repro, but this is not limited to Electron and it is not a bug but a design oversight.
From memory (this is the reason I don't use the C++ ThreadSafeFunction
in exprtk.js
- but a napi_threadsafe_function
):
an object containing a ThreadSafeFunction
cannot be destroyed. Calling Release
will trigger an async callback on the ThreadSafeFunction
object. You can't even use a dynamic object and delete it in the finalizer, since there is a wrapper that runs after your finalizer.
Hi @mmomtchev ,
That logic does follow. However from what I'm seeing in the code for the finalize wrappers, these just call the finalizer with the data and context as decided in the TSFN::New
call. The node-addon-api finalizer wrappers are static that take as its data this details::ThreadSafeFinalize
type, which encapsulates the user-provided finalizer and data passed that are passed into New()
. They do not access the ThreadSafeFunction
s members themselves.
Are you using a finalizer callback or data that is held by the C++ object you are Release()
ing from the destructor? If so, this makes sense to me that it will crash, as this memory is no longer valid when the finalizer wrapper gets called asynchronously after release. The TSFN finalizer callback and data must be alive throughout the existence of the TSFN.
Unless I am misunderstanding something...?
Yes indeed they are static, it might have been the data then, I will try to revert to ThreadSafeFunction
to check what was the problem with using a dynamic object - maybe I missed something. But you can't call Release
in the destructor for sure - as it triggers an async callback - as the code in this issue does and this is not really a bug, it is the way it works.
So what is the right way to release a TypedThreadSafeFunction
that is held by an object that is destroyed then?
Hi @robinchrist ,
In my threadsafe-function AsyncIterator
example, I call Release()
at the end of the std::thread
's entry point.
Yes, but that's not really a solution. The TSFN is held for a longer time and the objects which contain the TSFN are created and destroyed on the fly.
There must be a way to properly release a TSFN that is the member of an object inside the object's destructor?
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.
We discussed this in the 7 Oct Node API meeting.
My theory: Node destroys the TSFNs on Node's side before it frees / destroys the objects of the addon.
Both of the stacktraces posted above have a call to node::FreeEnvironment
. We will need to look into node core to verify the above assumption. We will need to look at the process of how/when entities are cleaned up during environment destruction.
@robinchrist is there any chance you could run under valgrind? That might give us more info on specifically what is leading to the crash. There are some instructions in https://github.com/nodejs/node/blob/main/doc/contributing/investigating-native-memory-leaks.md
Chiming in here to say I have this issue!
I've submitted an explicit document on the order of invocation of napi_finalize
callbacks and the cleanup hooks at https://github.com/nodejs/node/pull/45903. I'll compose an example of how to release nested resources properly at https://github.com/nodejs/node-addon-examples.
We discussed this issue in the 24 Feb Node API team meeting.
There is a backing PR (https://github.com/nodejs/node/pull/46692) in core to verify the order as described in https://github.com/nodejs/node/pull/45903. Once the test PR has been merged, we can merge the documentation PR.
I think we need to thoroughly examine all the places where we NAPI_FATAL_IF_FAILED and decide if we need to add a NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS section. For example, at https://github.com/nodejs/node-addon-api/blob/5adb896782ce3658422990ddb857e695070d1fc4/napi-inl.h#L2940, we raise a fatal error if napi_define_properties()
fails.
It seems feasible to call TSFN.Release()
within ObjectWrap::Finalize
.
// jsCallback is an Napi::ThreadSafeFunction
void MyObject::Finalize(const Napi::Env env) {
if (jsCallback) {
jsCallback.Release();
jsCallback = nil;
}
}
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.