mediasoup icon indicating copy to clipboard operation
mediasoup copied to clipboard

SIGSEGV at 'RTC::WebRtcTransport::OnIceServerTupleRemoved'

Open satoren opened this issue 3 years ago • 12 comments

Bug Report

It occurred 12 times in three days on ten servers.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000fffe19d71b4c in non-virtual thunk to RTC::WebRtcTransport::OnIceServerTupleRemoved(RTC::IceServer const*, RTC::TransportTuple*) () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
[Current thread is 1 (LWP 1882413)]
(gdb) 
(gdb) bt
#0  0x0000fffe19d71b4c in non-virtual thunk to RTC::WebRtcTransport::OnIceServerTupleRemoved(RTC::IceServer const*, RTC::TransportTuple*) () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#1  0x0000fffe19d71e68 in non-virtual thunk to RTC::WebRtcTransport::OnRtcTcpConnectionClosed(RTC::TcpServer*, RTC::TcpConnection*) () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#2  0x0000fffe1a1561c4 in TcpServerHandler::OnTcpConnectionClosed(TcpConnectionHandler*) () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#3  0x0000fffe19ea547c in uv.read () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#4  0x0000fffe19ea5b30 in uv.stream_io () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#5  0x0000fffe19eac7bc in uv.io_poll () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#6  0x0000fffe19e9faf4 in uv_run () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#7  0x0000fffe19ceb34c in DepLibUV::RunLoop() () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#8  0x0000fffe19cf4608 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#9  0x0000fffe19ce9b88 in mediasoup_worker_run () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#10 0x0000fffe19b8c55c in std::sys_common::backtrace::__rust_begin_short_backtrace () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#11 0x0000fffe19bc5754 in core::ops::function::FnOnce::call_once{{vtable-shim}} () from /opt/ovice_ex_core/ovice_ex_core/lib/mediasoup_elixir-0.4.1/priv/native/libmediasoup_elixir.so
#12 0x0000fffe1a20f654 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
#13 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
#14 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:108
#15 0x0000ffff9f2733f0 in ?? ()

Your environment

  • Operating system: Ubuntu 20.04.2
  • Node version: N/A
  • npm version: N/A
  • gcc/clang version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  • mediasoup version: Rust version mediasoup: 0.10.0 mediasoup-sys: 0.4.2
  • mediasoup-client version: 3.6.50

Issue description

satoren avatar Jul 26 '22 01:07 satoren

I'll take a look when I have time, doesn't look good, I assume this is a regression in the recent version too?

nazar-pc avatar Jul 26 '22 07:07 nazar-pc

Probably yes. The function that is crashing is coming from WebRtcServer PR.

satoren avatar Jul 26 '22 09:07 satoren

Are you using WebRTC server or not yet?

nazar-pc avatar Jul 26 '22 12:07 nazar-pc

Not yet, I just updated from 0.9.3 to 0.10.0 and changed the type ListenIP that causes build errors.

satoren avatar Jul 27 '22 00:07 satoren

@satoren, please you show the result of bt full.

jmillan avatar Jul 29 '22 09:07 jmillan

Probably doesn't help, as the C++ symbols seem to have been lost.

(gdb) bt full
#0  0x0000fffe09549b4c in non-virtual thunk to RTC::WebRtcTransport::OnIceServerTupleRemoved(RTC::IceServer const*, RTC::TransportTuple*) () at library/std/src/sync/once.rs:494
No symbol table info available.
#1  0x0000fffe09549e68 in non-virtual thunk to RTC::WebRtcTransport::OnRtcTcpConnectionClosed(RTC::TcpServer*, RTC::TcpConnection*) () at library/std/src/sync/once.rs:494
No symbol table info available.
#2  0x0000fffe0992e1c4 in TcpServerHandler::OnTcpConnectionClosed(TcpConnectionHandler*) () at library/std/src/sync/once.rs:494
No symbol table info available.
#3  0x0000fffe0967d47c in uv.read () at library/std/src/sync/once.rs:494
No symbol table info available.
#4  0x0000fffe0967db30 in uv.stream_io () at library/std/src/sync/once.rs:494
No symbol table info available.
#5  0x0000fffe096847bc in uv.io_poll () at library/std/src/sync/once.rs:494
No symbol table info available.
#6  0x0000fffe09677af4 in uv_run () at library/std/src/sync/once.rs:494
No symbol table info available.
#7  0x0000fffe094c334c in DepLibUV::RunLoop() () at library/std/src/sync/once.rs:494
No symbol table info available.
#8  0x0000fffe094cc608 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) () at library/std/src/sync/once.rs:494
No symbol table info available.
#9  0x0000fffe094c1b88 in mediasoup_worker_run () at library/std/src/sync/once.rs:494
No symbol table info available.
#10 0x0000fffe0936455c in std::sys_common::backtrace::__rust_begin_short_backtrace () at library/std/src/sync/once.rs:494
No symbol table info available.
#11 0x0000fffe0939d754 in core::ops::function::FnOnce::call_once{{vtable-shim}} () at library/std/src/sync/once.rs:494
No symbol table info available.
#12 0x0000fffe099e7654 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
No locals.
#13 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
No locals.
#14 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:108
No locals.
#15 0x0000ffff8e3173f0 in ?? ()

satoren avatar Aug 01 '22 01:08 satoren

Not yet, I just updated from 0.9.3 to 0.10.0 and changed the type ListenIP that causes build errors.

What is this exactly?

Probably doesn't help, as the C++ symbols seem to have been lost.

Since you are hitting the problem quite often, it would be helpful having the symbols for at least one of the dumps.

jmillan avatar Aug 01 '22 14:08 jmillan

Not yet, I just updated from 0.9.3 to 0.10.0 and changed the type ListenIP that causes build errors.

I have no idea what you mean here.

ibc avatar Aug 01 '22 16:08 ibc

I meant to say that I am not using the new features (WebRTCServer, etc.) in the version update. The Rust version had a slight type change, so the build did not pass as is.

satoren avatar Aug 02 '22 00:08 satoren

So you are using Rust, you are not using WebRtcServer at all and you changed something in the Rust code. And it crashes due to something in WebRtcServer (that you are not even using) when a ICE TCP connection is closed. How can it be? What change did you do in Rust side and why?

ibc avatar Aug 02 '22 06:08 ibc

The only change he did was related to code compilation, some arguments were refactored, library code didn't change too much IIRC, but I may check later again.

Looks like there might be some regression either in Rust library or worker that happens when WebRTC server isn't used.

nazar-pc avatar Aug 02 '22 07:08 nazar-pc

Now I realize that it is RTC::WebRtcTransport::OnIceServerTupleRemoved() the one crashing rather than in WebRtcServer...

ibc avatar Aug 02 '22 07:08 ibc

Maybe the reason is using freed memory image image

ybybwdwd avatar Sep 29 '22 08:09 ybybwdwd

There is no freed memory usage in there. this->tuples.erase(it) just removes a TransportTuple pointer from a map, it doesn't free it.

ibc avatar Sep 29 '22 08:09 ibc

There is no freed memory usage in there. this->tuples.erase(it) just removes a TransportTuple pointer from a map, it doesn't free it.

But this->tuples is std::list<RTC::TransportTuple>, not std::map<RTC::TransportTuple*>

ybybwdwd avatar Sep 29 '22 09:09 ybybwdwd

GOOD POINT!

ibc avatar Sep 29 '22 09:09 ibc

@ybybwdwd please take a look: https://github.com/versatica/mediasoup/pull/915

ibc avatar Sep 29 '22 09:09 ibc