bugtracker icon indicating copy to clipboard operation
bugtracker copied to clipboard

Kurento crashed close to default exceptionLimit "0.8"

Open avssav opened this issue 5 years ago • 2 comments

Prerequisites

  • [x] I have read the SUPPORT document

  • [x] I have checked the Troubleshooting Guide

  • [x] I have tested with the latest version of Kurento

Issue description

Kurento crashed 1st time since 15 Fed 2020. CPU usage was close to default exceptionLimit "0.8". CPUs idle time was 22.64% by zabbix monitoring. Is may be connected? Or is it bug?

Context

We restart our kms-6.13.2-dev docker container every night.

How to reproduce?

It was on production environment. I don't know how reproduce it.

INFO about Kurento Media Server

  • Kurento version: 6.13.2, nightly (6.13.2-20200507140307)
  • Server OS: Ubuntu 16.04.6 LTS
  • Installation method:
    • [] apt-get
    • [x] Docker (on c5.2xlarge AWS linux instance)
    • [] AWS
    • [] Built from sources

INFO about your Application Server

  • Language: Node.js
  • Kurento Client version: 6.13.2, nightly

INFO about your environment

I start kms, like

docker run --name kms-6.13.2-dev --restart always -p 8888:8888 -p 8433:8433 \
  --ulimit nofile="$(( ($(cat /proc/sys/fs/file-max) * 50) / 100 ))" \
-e GST_DEBUG="3,Kurento*:4,kms*:4,sdp*:4,webrtc*:4,*rtpendpoint:4,rtp*handler:4,rtpsynchronizer:4,agnosticbin:4,KurentoMediaElementImpl:4" \
  -e KMS_STUN_IP="X.X.X.X" -e KMS_STUN_PORT="3478" -e  KMS_MTU="1100" \
  -e KMS_TURN_URL="xxxxxxxx:[email protected]:3478?transport=udp" \
  -e KMS_EXTERNAL_ADDRESS="auto" -e KMS_NETWORK_INTERFACES="eth0" \
  -v $HOME/kurento-6.13.2-3:/etc/kurento -v $HOME/tmp/kurento:/tmp \
  --log-driver journald -d kurento/kurento-media-server-dev:6.13.2

There were 2 processes kurento at the moment of crash 15:50 UTC 29447 529% /usr/bin/kurento-media-server

16:00 UTC 29447 100% kurento-media-s 21894 252% /usr/bin/kurento-media-server

The system creates 2 dump files. There are exists some difference between them. Look up to "-" and "+" stings in begin backtrace.

I copied libkms* files from docker container to the host. Backtrace looks like:

root@kms:~/tmp/kurento# gdb /var/lib/docker/overlay2/d634b3a76048ab6f59bd3d7c1c4095253de5a7b7ac06f73b642da8d81e223e23/diff/usr/bin/kurento-media-server -c core_kurento-media-s_1_0_1589557794
Reading symbols from /var/lib/docker/overlay2/d634b3a76048ab6f59bd3d7c1c4095253de5a7b7ac06f73b642da8d81e223e23/diff/usr/bin/kurento-media-server...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 1334]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/kurento-media-server '.
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0  __GI_abort () at abort.c:125
-125     abort.c: No such file or directory.
-[Current thread is 1 (Thread 0x7fc7e12b4700 (LWP 30))]
+Program terminated with signal SIGABRT, Aborted.
+#0  0x00007fc7eb88b428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
+54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt

-__GI_abort () at abort.c:125
+__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
+__GI_abort () at abort.c:89

Debug::DeathHandler::SignalHandler(int, void*, void*) ()
<signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
g_slice_alloc (magazine_chunks=<optimized out>) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:539
g_slice_alloc (tmem=<optimized out>, ix=7) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:842
g_slice_alloc (mem_size=mem_size@entry=128) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:1016
g_slice_alloc0 (mem_size=mem_size@entry=128) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:1051
gst_message_new_custom (type=type@entry=GST_MESSAGE_STATE_CHANGED, src=src@entry=0x7fc5a81b17e0 [GstFunnel], structure=0x7fc5bc12ba40) at gstmessage.c:288
gst_message_new_state_changed (src=src@entry=0x7fc5a81b17e0 [GstFunnel], oldstate=2820347872, oldstate@entry=GST_STATE_PLAYING, newstate=newstate@entry=GST_STATE_PAUSED, pending=32709, pending@entry=GST_STATE_VOID_PENDING) at gstmessage.c:566
_priv_gst_element_state_changed (element=element@entry=0x7fc5a81b17e0 [GstFunnel], oldstate=oldstate@entry=GST_STATE_PLAYING, newstate=newstate@entry=GST_STATE_PAUSED, pending=pending@entry=GST_STATE_VOID_PENDING) at gstelement.c:2282
gst_element_continue_state (element=element@entry=0x7fc5a81b17e0 [GstFunnel], ret=ret@entry=GST_STATE_CHANGE_SUCCESS) at gstelement.c:2382
gst_element_change_state (element=element@entry=0x7fc5a81b17e0 [GstFunnel], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2687
gst_element_set_state_func (element=0x7fc5a81b17e0 [GstFunnel], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc5a81b17e0 [GstFunnel], bin=0x7fc5d87d8a40 [GstDtlsSrtpEnc]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc5d87d8a40 [GstDtlsSrtpEnc], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc5d87d8a40 [GstDtlsSrtpEnc], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc5d87d8a40 [GstDtlsSrtpEnc], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc5d87d8a40 [GstDtlsSrtpEnc], bin=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], bin=0x7fc5c43db410 [KmsWebrtcSession]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc5c43db410 [KmsWebrtcSession], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc5c43db410 [KmsWebrtcSession], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc5c43db410 [KmsWebrtcSession], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc5c43db410 [KmsWebrtcSession], bin=0x7fc5089a8b60 [KmsWebrtcEndpoint]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc5089a8b60 [KmsWebrtcEndpoint], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc5089a8b60 [KmsWebrtcEndpoint], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc5089a8b60 [KmsWebrtcEndpoint], state=GST_STATE_NULL) at gstelement.c:2602
kurento::MediaElementImpl::~MediaElementImpl() () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
kurento::WebRtcEndpointImpl::~WebRtcEndpointImpl() () at /usr/lib/x86_64-linux-gnu/libkmselementsimpl.so.6
kurento::WebRtcEndpointImpl::~WebRtcEndpointImpl() () at /usr/lib/x86_64-linux-gnu/libkmselementsimpl.so.6
std::_Function_handler<void (), std::_Bind<void (*(kurento::MediaObjectImpl*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(kurento::MediaObjectImpl*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> >::_M_invoke
(std::_Any_data const&) () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
oost::asio::detail::completion_handler<std::function<void ()> >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) ()
    at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
 () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
std::thread::_Impl<std::_Bind_simple<std::_Bind<void (*(boost::shared_ptr<boost::asio::io_service>))(boost::shared_ptr<boost::asio::io_service>)> ()> >::_M_run() () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
 () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
start_thread (arg=0x7fc7e12b4700) at pthread_create.c:333
clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

avssav avatar May 15 '20 18:05 avssav

exceptionLimit only refers to the 80% of current system allowed number of threads, and number of open file descriptors. You can see what are the current system values with this log line:

INFO    KurentoServerMethods ServerMethods.cpp:109:ServerMethods: System limits: 62702 threads, 1024 files

So, I doubt it has anything to do with your crash (except if the number of threads or FDs are very low).

Can you please upload the core dump files so I can inspect them?

j1elo avatar May 18 '20 10:05 j1elo

I found kms crash in g_slice_alloc many times. It have no regular pattern. Kms not crash when I used kurento 6.11.0, after I merged some commit from latest(6.13.0), kms often crash in g_slice_alloc, now I revert some commit for trace the bug.

sshsun1990 avatar Oct 12 '20 12:10 sshsun1990