Kurento crashed close to default exceptionLimit "0.8"
Prerequisites
-
[x] I have read the SUPPORT document
-
[x] I have checked the Troubleshooting Guide
-
[x] I have tested with the latest version of Kurento
Issue description
Kurento crashed 1st time since 15 Fed 2020. CPU usage was close to default exceptionLimit "0.8". CPUs idle time was 22.64% by zabbix monitoring. Is may be connected? Or is it bug?
Context
We restart our kms-6.13.2-dev docker container every night.
How to reproduce?
It was on production environment. I don't know how reproduce it.
INFO about Kurento Media Server
- Kurento version: 6.13.2, nightly (6.13.2-20200507140307)
- Server OS: Ubuntu 16.04.6 LTS
- Installation method:
- [] apt-get
- [x] Docker (on c5.2xlarge AWS linux instance)
- [] AWS
- [] Built from sources
INFO about your Application Server
- Language: Node.js
- Kurento Client version: 6.13.2, nightly
INFO about your environment
I start kms, like
docker run --name kms-6.13.2-dev --restart always -p 8888:8888 -p 8433:8433 \
--ulimit nofile="$(( ($(cat /proc/sys/fs/file-max) * 50) / 100 ))" \
-e GST_DEBUG="3,Kurento*:4,kms*:4,sdp*:4,webrtc*:4,*rtpendpoint:4,rtp*handler:4,rtpsynchronizer:4,agnosticbin:4,KurentoMediaElementImpl:4" \
-e KMS_STUN_IP="X.X.X.X" -e KMS_STUN_PORT="3478" -e KMS_MTU="1100" \
-e KMS_TURN_URL="xxxxxxxx:[email protected]:3478?transport=udp" \
-e KMS_EXTERNAL_ADDRESS="auto" -e KMS_NETWORK_INTERFACES="eth0" \
-v $HOME/kurento-6.13.2-3:/etc/kurento -v $HOME/tmp/kurento:/tmp \
--log-driver journald -d kurento/kurento-media-server-dev:6.13.2
There were 2 processes kurento at the moment of crash 15:50 UTC 29447 529% /usr/bin/kurento-media-server
16:00 UTC 29447 100% kurento-media-s 21894 252% /usr/bin/kurento-media-server
The system creates 2 dump files. There are exists some difference between them. Look up to "-" and "+" stings in begin backtrace.
I copied libkms* files from docker container to the host. Backtrace looks like:
root@kms:~/tmp/kurento# gdb /var/lib/docker/overlay2/d634b3a76048ab6f59bd3d7c1c4095253de5a7b7ac06f73b642da8d81e223e23/diff/usr/bin/kurento-media-server -c core_kurento-media-s_1_0_1589557794
Reading symbols from /var/lib/docker/overlay2/d634b3a76048ab6f59bd3d7c1c4095253de5a7b7ac06f73b642da8d81e223e23/diff/usr/bin/kurento-media-server...(no debugging symbols found)...done.
warning: core file may not match specified executable file.
[New LWP 1334]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/kurento-media-server '.
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0 __GI_abort () at abort.c:125
-125 abort.c: No such file or directory.
-[Current thread is 1 (Thread 0x7fc7e12b4700 (LWP 30))]
+Program terminated with signal SIGABRT, Aborted.
+#0 0x00007fc7eb88b428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
+54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
-__GI_abort () at abort.c:125
+__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
+__GI_abort () at abort.c:89
Debug::DeathHandler::SignalHandler(int, void*, void*) ()
<signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
g_slice_alloc (magazine_chunks=<optimized out>) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:539
g_slice_alloc (tmem=<optimized out>, ix=7) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:842
g_slice_alloc (mem_size=mem_size@entry=128) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:1016
g_slice_alloc0 (mem_size=mem_size@entry=128) at /build/glib2.0-xkQkqE/glib2.0-2.48.2/./glib/gslice.c:1051
gst_message_new_custom (type=type@entry=GST_MESSAGE_STATE_CHANGED, src=src@entry=0x7fc5a81b17e0 [GstFunnel], structure=0x7fc5bc12ba40) at gstmessage.c:288
gst_message_new_state_changed (src=src@entry=0x7fc5a81b17e0 [GstFunnel], oldstate=2820347872, oldstate@entry=GST_STATE_PLAYING, newstate=newstate@entry=GST_STATE_PAUSED, pending=32709, pending@entry=GST_STATE_VOID_PENDING) at gstmessage.c:566
_priv_gst_element_state_changed (element=element@entry=0x7fc5a81b17e0 [GstFunnel], oldstate=oldstate@entry=GST_STATE_PLAYING, newstate=newstate@entry=GST_STATE_PAUSED, pending=pending@entry=GST_STATE_VOID_PENDING) at gstelement.c:2282
gst_element_continue_state (element=element@entry=0x7fc5a81b17e0 [GstFunnel], ret=ret@entry=GST_STATE_CHANGE_SUCCESS) at gstelement.c:2382
gst_element_change_state (element=element@entry=0x7fc5a81b17e0 [GstFunnel], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2687
gst_element_set_state_func (element=0x7fc5a81b17e0 [GstFunnel], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc5a81b17e0 [GstFunnel], bin=0x7fc5d87d8a40 [GstDtlsSrtpEnc]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc5d87d8a40 [GstDtlsSrtpEnc], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc5d87d8a40 [GstDtlsSrtpEnc], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc5d87d8a40 [GstDtlsSrtpEnc], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc5d87d8a40 [GstDtlsSrtpEnc], bin=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc4e424c3e0 [KmsWebrtcTransportSinkNice], bin=0x7fc5c43db410 [KmsWebrtcSession]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc5c43db410 [KmsWebrtcSession], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc5c43db410 [KmsWebrtcSession], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc5c43db410 [KmsWebrtcSession], state=GST_STATE_PAUSED) at gstelement.c:2602
gst_bin_change_state_func (next=GST_STATE_PAUSED, current=GST_STATE_PLAYING, start_time=0, base_time=1308672274903525, element=0x7fc5c43db410 [KmsWebrtcSession], bin=0x7fc5089a8b60 [KmsWebrtcEndpoint]) at gstbin.c:2414
gst_bin_change_state_func (element=0x7fc5089a8b60 [KmsWebrtcEndpoint], transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstbin.c:2756
gst_element_change_state (element=element@entry=0x7fc5089a8b60 [KmsWebrtcEndpoint], transition=transition@entry=GST_STATE_CHANGE_PLAYING_TO_PAUSED) at gstelement.c:2648
gst_element_set_state_func (element=0x7fc5089a8b60 [KmsWebrtcEndpoint], state=GST_STATE_NULL) at gstelement.c:2602
kurento::MediaElementImpl::~MediaElementImpl() () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
kurento::WebRtcEndpointImpl::~WebRtcEndpointImpl() () at /usr/lib/x86_64-linux-gnu/libkmselementsimpl.so.6
kurento::WebRtcEndpointImpl::~WebRtcEndpointImpl() () at /usr/lib/x86_64-linux-gnu/libkmselementsimpl.so.6
std::_Function_handler<void (), std::_Bind<void (*(kurento::MediaObjectImpl*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(kurento::MediaObjectImpl*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> >::_M_invoke
(std::_Any_data const&) () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
oost::asio::detail::completion_handler<std::function<void ()> >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) ()
at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
() at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
std::thread::_Impl<std::_Bind_simple<std::_Bind<void (*(boost::shared_ptr<boost::asio::io_service>))(boost::shared_ptr<boost::asio::io_service>)> ()> >::_M_run() () at /usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6
() at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
start_thread (arg=0x7fc7e12b4700) at pthread_create.c:333
clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
exceptionLimit only refers to the 80% of current system allowed number of threads, and number of open file descriptors. You can see what are the current system values with this log line:
INFO KurentoServerMethods ServerMethods.cpp:109:ServerMethods: System limits: 62702 threads, 1024 files
So, I doubt it has anything to do with your crash (except if the number of threads or FDs are very low).
Can you please upload the core dump files so I can inspect them?
I found kms crash in g_slice_alloc many times. It have no regular pattern. Kms not crash when I used kurento 6.11.0, after I merged some commit from latest(6.13.0), kms often crash in g_slice_alloc, now I revert some commit for trace the bug.