nchan icon indicating copy to clipboard operation
nchan copied to clipboard

Worker crash on websocket_publish_callback

Open rponczkowski opened this issue 7 years ago • 3 comments

Using versions: nginx-1.12.2-2.el7.x86_64 nchan-1.2.1

On heavy load occasionally we have worker crash. Only one WebSocket subscriber is connected. there Is many small notifications sent. Core dump is available.

First worker crash: Backtrace have many(above 800) recurrent calls to websocket_reading function #0 0x00007f04ddfa321b in websocket_reading (r=r@entry=0x7f04e3aae0c0) at nchan-1.2.1/src/subscribers/websocket.c:1312 #1 0x00007f04ddfa3341 in websocket_reading (r=r@entry=0x7f04e3aae0c0) at nchan-1.2.1/src/subscribers/websocket.c:1528 ...... So it looks like stack overflow.

after that next workers crashes on: #0 websocket_publish_callback (status=9001, ch=0x7fff45465a00, d=0x7f8442383368) at nchan-1.2.1/src/subscribers/websocket.c:449 449 full_subscriber_t *fsub = d->fsub; Missing separate debuginfos, use: debuginfo-install nginx-1.12.2-2.el7.x86_64 (gdb) bt #0 websocket_publish_callback (status=9001, ch=0x7fff45465a00, d=0x7f8442383368) at nchan-1.2.1/src/subscribers/websocket.c:449 #1 0x00007f843c5fb33d in receive_publish_message_reply (sender=, d=0x7fff45465a80) at nchan-1.2.1/src/store/memory/ipc-handlers.c:496 #2 0x00007f843c5fa7a7 in ipc_read_handler (ev=0x7f84421c1b90) at nchan-1.2.1/src/store/memory/ipc.c:457 #3 0x00007f84400e2f01 in ngx_epoll_process_events (cycle=0x7f844215bd30, timer=, flags=) at src/event/modules/ngx_epoll_module.c:902 #4 0x00007f84400d82fa in ngx_process_events_and_timers (cycle=cycle@entry=0x7f844215bd30) at src/event/ngx_event.c:242 #5 0x00007f84400e0535 in ngx_worker_process_cycle (cycle=cycle@entry=0x7f844215bd30, data=data@entry=0x1) at src/os/unix/ngx_process_cycle.c:749 #6 0x00007f84400dee9a in ngx_spawn_process (cycle=cycle@entry=0x7f844215bd30, proc=0x7f84400e04e0 <ngx_worker_process_cycle>, data=0x1, name=0x7f8440162897 "worker process", respawn=respawn@entry=1) at src/os/unix/ngx_process.c:198 #7 0x00007f84400e1c67 in ngx_reap_children (cycle=0x7f844215bd30) at src/os/unix/ngx_process_cycle.c:621 #8 ngx_master_process_cycle (cycle=cycle@entry=0x7f844215bd30) at src/os/unix/ngx_process_cycle.c:174 #9 0x00007f84400b6239 in main (argc=, argv=) at src/core/nginx.c:375

rponczkowski avatar Sep 24 '18 13:09 rponczkowski

I'd like to look at the first coredump, the second is not important. I'll need the coredump and the Nginx binary, as well as your OS version. Send me a link to [email protected] . If it's any easier for you, I can just take a look on the server where this crash occurred.

slact avatar Oct 01 '18 17:10 slact

I've sent you an email. Back trace looks slightly different but I believe root cause is the same. About operating system: Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-327.36.3.el7.x86_64

rponczkowski avatar Oct 05 '18 08:10 rponczkowski

Got a a fix for you to try in fix-488. A stack overflow was possible there for a backed-up websocket subscriber. Please build from that branch and let me know your results.

slact avatar Oct 06 '18 18:10 slact