confluent-kafka-javascript
confluent-kafka-javascript copied to clipboard
Segfault in e2e/groups.spec.js
Environment Information
- OS: MacOS 15.1
- Node Version: 20.18
- NPM Version: 10.18.2
- C++ Toolchain: llvm
- confluent-kafka-javascript version: 377e22ab7e86ce6b0bffc06d3a6deb573d51288b
Steps to Reproduce
$ npx mocha --timeout 120000 e2e/groups.spec.js
Consumer group/Producer
[1] 13601 segmentation fault npx mocha --timeout 120000 e2e/
Here's the stack trace:
llnode -- node ./node_modules/.bin/mocha --timeout 120000 e2e/groups.spec.js
(lldb) target create "node"
Current executable set to '/opt/node-debug/bin/node' (arm64).
(lldb) settings set -- target.run-args "./node_modules/.bin/mocha" "--timeout" "120000" "e2e/groups.spec.js"
(lldb) plugin load '/Users/trevor/.nvm/versions/node/v20.18.0/lib/node_modules/llnode/llnode.dylib'
(lldb) settings set prompt '(llnode) '
(llnode) r
Process 11240 launched: '/opt/node-debug/bin/node' (arm64)
Consumer group/Producer
Process 11240 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function>::New(isolate=0x0000000140008000, that=0x0000000000000000) at v8-local-handle.h:287:68 [opt]
284
285 V8_INLINE static Local<T> New(Isolate* isolate,
286 const PersistentBase<T>& that) {
-> 287 return New(isolate, internal::ValueHelper::SlotAsValue<T>(that.val_));
288 }
289
290 V8_INLINE static Local<T> New(Isolate* isolate,
warning: confluent-kafka-javascript.node was compiled with optimization - stepping may behave oddly; variables may not be available.
(llnode) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function>::New(isolate=0x0000000140008000, that=0x0000000000000000) at v8-local-handle.h:287:68 [opt]
frame #1: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function> Nan::New<v8::Function, v8::NonCopyablePersistentTraits<v8::Function>>(p=0x0000000000000000) at nan_implementation_12_inl.h:422:10 [opt]
frame #2: 0x0000000128807014 confluent-kafka-javascript.node`Nan::Callback::Call_(this=0x0000000000000000, isolate=<unavailable>, target=(val_ = 0x000000013800aee8), argc=4, argv=0x000000016fdf6968, resource=0x000000016fdf68a0) const at nan.h:1880:36 [opt]
frame #3: 0x0000000128802fac confluent-kafka-javascript.node`Nan::Callback::Call(this=0x0000000000000000, argc=4, argv=0x000000016fdf6968) const at nan.h:1824:25 [opt]
frame #4: 0x0000000128825a54 confluent-kafka-javascript.node`NodeKafka::Workers::KafkaConsumerConsumeLoop::HandleMessageCallback(this=0x000000014f708810, msg=0x00006000023a8d80, ec=<unavailable>) at workers.cc:775:13 [opt]
frame #5: 0x0000000128829e90 confluent-kafka-javascript.node`NodeKafka::Workers::MessageWorker::WorkMessage(this=0x000000014f708810) at workers.h:110:7 [opt]
frame #6: 0x0000000100b88d08 node`uv__async_io(loop=0x00000001041d4c90, w=<unavailable>, events=<unavailable>) at async.c:176:5 [opt]
frame #7: 0x0000000100b9b9b0 node`uv__io_poll(loop=0x00000001041d4c90, timeout=0) at kqueue.c:381:9 [opt]
frame #8: 0x0000000100b89270 node`uv_run(loop=0x00000001041d4c90, mode=UV_RUN_DEFAULT) at core.c:447:5 [opt]
frame #9: 0x0000000100005818 node`node::SpinEventLoopInternal(env=0x0000000138018c00) at embed_helpers.cc:41:7 [opt]
frame #10: 0x00000001001347ac node`node::NodeMainInstance::Run(this=<unavailable>, exit_code=0x000000016fdff184, env=0x0000000138018c00) at node_main_instance.cc:124:9 [opt]
frame #11: 0x00000001001344b8 node`node::NodeMainInstance::Run(this=<unavailable>) at node_main_instance.cc:100:3 [opt]
frame #12: 0x00000001000b02c4 node`node::Start(int, char**) [inlined] node::StartInternal(argc=<unavailable>, argv=<unavailable>) at node.cc:1502:24 [opt]
frame #13: 0x00000001000b01d4 node`node::Start(argc=<unavailable>, argv=<unavailable>) at node.cc:1509:27 [opt]
frame #14: 0x00000001863b4274 dyld`start + 2840
(This even occurs with a fix for #161.)
Thank you for the report. I can't reproduce it, but looking at valgrind and the stacktrace, I think it's the same one as #34 . As far as I understand, it's because we're sending the message from the consume loop thread, to the event loop thread, but by the time it makes to the event loop thread, the callback it's supposed to call is destroyed already.
It seems to be specific to the consume() loop in the non-promisified API.
I'll keep both around until I'm certain that they are the same.