confluent-kafka-javascript icon indicating copy to clipboard operation
confluent-kafka-javascript copied to clipboard

Segfault in e2e/groups.spec.js

Open trevorr opened this issue 1 year ago • 1 comments

Environment Information

  • OS: MacOS 15.1
  • Node Version: 20.18
  • NPM Version: 10.18.2
  • C++ Toolchain: llvm
  • confluent-kafka-javascript version: 377e22ab7e86ce6b0bffc06d3a6deb573d51288b

Steps to Reproduce

$ npx mocha --timeout 120000 e2e/groups.spec.js

  Consumer group/Producer
[1]    13601 segmentation fault  npx mocha --timeout 120000 e2e/

Here's the stack trace:

llnode -- node ./node_modules/.bin/mocha  --timeout 120000 e2e/groups.spec.js
(lldb) target create "node"
Current executable set to '/opt/node-debug/bin/node' (arm64).
(lldb) settings set -- target.run-args  "./node_modules/.bin/mocha" "--timeout" "120000" "e2e/groups.spec.js"
(lldb) plugin load '/Users/trevor/.nvm/versions/node/v20.18.0/lib/node_modules/llnode/llnode.dylib'
(lldb) settings set prompt '(llnode) '
(llnode) r
Process 11240 launched: '/opt/node-debug/bin/node' (arm64)


  Consumer group/Producer
Process 11240 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function>::New(isolate=0x0000000140008000, that=0x0000000000000000) at v8-local-handle.h:287:68 [opt]
   284 	
   285 	  V8_INLINE static Local<T> New(Isolate* isolate,
   286 	                                const PersistentBase<T>& that) {
-> 287 	    return New(isolate, internal::ValueHelper::SlotAsValue<T>(that.val_));
   288 	  }
   289 	
   290 	  V8_INLINE static Local<T> New(Isolate* isolate,
warning: confluent-kafka-javascript.node was compiled with optimization - stepping may behave oddly; variables may not be available.
(llnode) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function>::New(isolate=0x0000000140008000, that=0x0000000000000000) at v8-local-handle.h:287:68 [opt]
    frame #1: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function> Nan::New<v8::Function, v8::NonCopyablePersistentTraits<v8::Function>>(p=0x0000000000000000) at nan_implementation_12_inl.h:422:10 [opt]
    frame #2: 0x0000000128807014 confluent-kafka-javascript.node`Nan::Callback::Call_(this=0x0000000000000000, isolate=<unavailable>, target=(val_ = 0x000000013800aee8), argc=4, argv=0x000000016fdf6968, resource=0x000000016fdf68a0) const at nan.h:1880:36 [opt]
    frame #3: 0x0000000128802fac confluent-kafka-javascript.node`Nan::Callback::Call(this=0x0000000000000000, argc=4, argv=0x000000016fdf6968) const at nan.h:1824:25 [opt]
    frame #4: 0x0000000128825a54 confluent-kafka-javascript.node`NodeKafka::Workers::KafkaConsumerConsumeLoop::HandleMessageCallback(this=0x000000014f708810, msg=0x00006000023a8d80, ec=<unavailable>) at workers.cc:775:13 [opt]
    frame #5: 0x0000000128829e90 confluent-kafka-javascript.node`NodeKafka::Workers::MessageWorker::WorkMessage(this=0x000000014f708810) at workers.h:110:7 [opt]
    frame #6: 0x0000000100b88d08 node`uv__async_io(loop=0x00000001041d4c90, w=<unavailable>, events=<unavailable>) at async.c:176:5 [opt]
    frame #7: 0x0000000100b9b9b0 node`uv__io_poll(loop=0x00000001041d4c90, timeout=0) at kqueue.c:381:9 [opt]
    frame #8: 0x0000000100b89270 node`uv_run(loop=0x00000001041d4c90, mode=UV_RUN_DEFAULT) at core.c:447:5 [opt]
    frame #9: 0x0000000100005818 node`node::SpinEventLoopInternal(env=0x0000000138018c00) at embed_helpers.cc:41:7 [opt]
    frame #10: 0x00000001001347ac node`node::NodeMainInstance::Run(this=<unavailable>, exit_code=0x000000016fdff184, env=0x0000000138018c00) at node_main_instance.cc:124:9 [opt]
    frame #11: 0x00000001001344b8 node`node::NodeMainInstance::Run(this=<unavailable>) at node_main_instance.cc:100:3 [opt]
    frame #12: 0x00000001000b02c4 node`node::Start(int, char**) [inlined] node::StartInternal(argc=<unavailable>, argv=<unavailable>) at node.cc:1502:24 [opt]
    frame #13: 0x00000001000b01d4 node`node::Start(argc=<unavailable>, argv=<unavailable>) at node.cc:1509:27 [opt]
    frame #14: 0x00000001863b4274 dyld`start + 2840

(This even occurs with a fix for #161.)

trevorr avatar Nov 16 '24 00:11 trevorr

Thank you for the report. I can't reproduce it, but looking at valgrind and the stacktrace, I think it's the same one as #34 . As far as I understand, it's because we're sending the message from the consume loop thread, to the event loop thread, but by the time it makes to the event loop thread, the callback it's supposed to call is destroyed already.

It seems to be specific to the consume() loop in the non-promisified API.

I'll keep both around until I'm certain that they are the same.

milindl avatar Nov 18 '24 14:11 milindl