Worker crash on websocket_publish_callback
Using versions: nginx-1.12.2-2.el7.x86_64 nchan-1.2.1
On heavy load occasionally we have worker crash. Only one WebSocket subscriber is connected. there Is many small notifications sent. Core dump is available.
First worker crash: Backtrace have many(above 800) recurrent calls to websocket_reading function #0 0x00007f04ddfa321b in websocket_reading (r=r@entry=0x7f04e3aae0c0) at nchan-1.2.1/src/subscribers/websocket.c:1312 #1 0x00007f04ddfa3341 in websocket_reading (r=r@entry=0x7f04e3aae0c0) at nchan-1.2.1/src/subscribers/websocket.c:1528 ...... So it looks like stack overflow.
after that next workers crashes on:
#0 websocket_publish_callback (status=9001, ch=0x7fff45465a00, d=0x7f8442383368) at nchan-1.2.1/src/subscribers/websocket.c:449
449 full_subscriber_t *fsub = d->fsub;
Missing separate debuginfos, use: debuginfo-install nginx-1.12.2-2.el7.x86_64
(gdb) bt
#0 websocket_publish_callback (status=9001, ch=0x7fff45465a00, d=0x7f8442383368) at nchan-1.2.1/src/subscribers/websocket.c:449
#1 0x00007f843c5fb33d in receive_publish_message_reply (sender=
I'd like to look at the first coredump, the second is not important. I'll need the coredump and the Nginx binary, as well as your OS version. Send me a link to [email protected] . If it's any easier for you, I can just take a look on the server where this crash occurred.
I've sent you an email. Back trace looks slightly different but I believe root cause is the same. About operating system: Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-327.36.3.el7.x86_64
Got a a fix for you to try in fix-488. A stack overflow was possible there for a backed-up websocket subscriber. Please build from that branch and let me know your results.