mediasoup icon indicating copy to clipboard operation
mediasoup copied to clipboard

Worker hits ASSERT condition in production

Open gitamirp opened this issue 1 year ago • 14 comments

worker hits assert in

https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L37

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8d1f851535 in __GI_abort () at abort.c:79
#2  0x000055814b9c3e7f in RTC::RateCalculator::Update(unsigned long, unsigned long) ()
#3  0x000055814ba754d1 in RTC::WebRtcTransport::SendRtpPacket(RTC::Consumer*, RTC::RtpPacket*, std::function<void (bool)> const*) ()
#4  0x000055814ba528d2 in RTC::Transport::OnConsumerSendRtpPacket(RTC::Consumer*, RTC::RtpPacket*) ()
#5  0x000055814ba116f0 in RTC::SimpleConsumer::SendRtpPacket(RTC::RtpPacket*, std::shared_ptr<RTC::RtpPacket>&) ()
#6  0x000055814b9d38c2 in RTC::Router::OnTransportProducerRtpPacketReceived(RTC::Transport*, RTC::Producer*, RTC::RtpPacket*) ()
#7  0x000055814ba510a0 in RTC::Transport::OnProducerRtpPacketReceived(RTC::Producer*, RTC::RtpPacket*) ()
#8  0x000055814b9bd28e in RTC::Producer::ReceiveRtpPacket(RTC::RtpPacket*) ()
#9  0x000055814ba3d326 in RTC::Transport::ReceiveRtpPacket(RTC::RtpPacket*) ()
#10 0x000055814ba7ee97 in RTC::WebRtcTransport::OnRtpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#11 0x000055814ba808b3 in RTC::WebRtcTransport::OnPacketReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#12 0x000055814ba81192 in non-virtual thunk to RTC::WebRtcTransport::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
#13 0x000055814ba6a5d1 in RTC::UdpSocket::UserOnUdpDatagramReceived(unsigned char const*, unsigned long, sockaddr const*) ()
#14 0x000055814b9218ce in onRecv(uv_udp_s*, long, uv_buf_t const*, sockaddr const*, unsigned int) ()
#15 0x000055814bde4c5d in uv.udp_io ()
#16 0x000055814bde7fd6 in uv.io_poll ()
#17 0x000055814bddb6ae in uv_run ()
#18 0x000055814b8c410f in DepLibUV::RunLoop() ()
#19 0x000055814b8ed381 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) ()
#20 0x000055814b8be813 in mediasoup_worker_run ()
#21 0x000055814b8bc709 in main ()

gitamirp avatar Jan 17 '24 10:01 gitamirp

@gitamirp,

  • Which mediasoup version are you using?
  • Can you repro? can you provide an STR?
  • Please provide as much context info as you can.

jmillan avatar Jan 17 '24 10:01 jmillan

@jmillan thanks for looking into it. We are using version 3.12.12. Unfortunately, it happens in a production environment therefore its harder to get detailed logs or STR for it. It rarely happens but I can try to get some more info around it.

gitamirp avatar Jan 17 '24 11:01 gitamirp

You should upgrade to latest release first, 3.12.12 is very old and the issue you are hitting might have been long fixed

nazar-pc avatar Jan 17 '24 11:01 nazar-pc

Yes please, you're using quite an old version. Upgrade it as soon as you can.

Don't hesitate to reopen the issue if it comes out in the future.

jmillan avatar Jan 17 '24 11:01 jmillan

Honestly there is no change since 3.12.12 that could have fix this issue. Let me reopen to not forget. I think we need a fuzzer test for this class.

ibc avatar Jan 17 '24 11:01 ibc

thanks, fwiw there were no significant changes to RateCalculator.cpp over the last 2 years.

https://github.com/versatica/mediasoup/commits/v3/worker/src/RTC/RateCalculator.cpp

I suspect it might occur when

https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L27

nowMs - this->newestItemStartTime >= this->itemSizeMs

and

https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L32

this->newestItemIndex >= this->windowItems

and

this->oldestItemIndex == 0

probably just logging an error and resting the RateCalculator will be more appropriate error handling

gitamirp avatar Jan 17 '24 13:01 gitamirp

probably just logging an error and resting the RateCalculator will be more appropriate error handling

Defensive programming hides real bugs. We don't do that. If something should never happen then it must never happen, otherwise it's a bug that needs to be fixed.

ibc avatar Jan 17 '24 13:01 ibc

If there was a memory corruption somewhere, side-effects might be unexpected. This needs to be confirmed on latest version first.

nazar-pc avatar Jan 17 '24 13:01 nazar-pc

probably just logging an error and resting the RateCalculator will be more appropriate error handling

Defensive programming hides real bugs. We don't do that. If something should never happen then it must never happen, otherwise it's a bug that needs to be fixed.

also true

gitamirp avatar Jan 17 '24 13:01 gitamirp

If there was a memory corruption somewhere, side-effects might be unexpected. This needs to be confirmed on latest version first.

I am not sure how fast we can get these hosts on latest release. for now I will just patch it to suit our needs.

I'll let you know if I find more about this condition.

thanks

gitamirp avatar Jan 17 '24 13:01 gitamirp

@gitamirp any news? I assumed you patched version 3.12.12 so it's not a problem for you anymore but perhaps you updated to latest version (without patching it)?

ibc avatar Feb 14 '24 17:02 ibc

We patched 3.12.12 by using

MS_ASSERT( this->newestItemIndex != this->oldestItemIndex || this->oldestItemIndex == -1 || this->newestItemIndex, "newest index overlaps with the oldest one");

I will let you know if we run into this ASSERT again with this patch. If you are interested we can add some logs when

this->newestItemIndex == this->oldestItemIndex

I believe that in 3.12.12 both newest && oldest can be zero at the same time but I don't have logs to support it.

gitamirp avatar Feb 14 '24 23:02 gitamirp

Yes please. Add those logs and comment here when you get something. We will investigate this next week. Too busy these days.

ibc avatar Feb 20 '24 19:02 ibc

Sure, we'll do. thanks

gitamirp avatar Feb 20 '24 20:02 gitamirp