Worker hits ASSERT condition in production
worker hits assert in
https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L37
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f8d1f851535 in __GI_abort () at abort.c:79
#2 0x000055814b9c3e7f in RTC::RateCalculator::Update(unsigned long, unsigned long) ()
#3 0x000055814ba754d1 in RTC::WebRtcTransport::SendRtpPacket(RTC::Consumer*, RTC::RtpPacket*, std::function<void (bool)> const*) ()
#4 0x000055814ba528d2 in RTC::Transport::OnConsumerSendRtpPacket(RTC::Consumer*, RTC::RtpPacket*) ()
#5 0x000055814ba116f0 in RTC::SimpleConsumer::SendRtpPacket(RTC::RtpPacket*, std::shared_ptr<RTC::RtpPacket>&) ()
#6 0x000055814b9d38c2 in RTC::Router::OnTransportProducerRtpPacketReceived(RTC::Transport*, RTC::Producer*, RTC::RtpPacket*) ()
#7 0x000055814ba510a0 in RTC::Transport::OnProducerRtpPacketReceived(RTC::Producer*, RTC::RtpPacket*) ()
#8 0x000055814b9bd28e in RTC::Producer::ReceiveRtpPacket(RTC::RtpPacket*) ()
#9 0x000055814ba3d326 in RTC::Transport::ReceiveRtpPacket(RTC::RtpPacket*) ()
#10 0x000055814ba7ee97 in RTC::WebRtcTransport::OnRtpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#11 0x000055814ba808b3 in RTC::WebRtcTransport::OnPacketReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#12 0x000055814ba81192 in non-virtual thunk to RTC::WebRtcTransport::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
#13 0x000055814ba6a5d1 in RTC::UdpSocket::UserOnUdpDatagramReceived(unsigned char const*, unsigned long, sockaddr const*) ()
#14 0x000055814b9218ce in onRecv(uv_udp_s*, long, uv_buf_t const*, sockaddr const*, unsigned int) ()
#15 0x000055814bde4c5d in uv.udp_io ()
#16 0x000055814bde7fd6 in uv.io_poll ()
#17 0x000055814bddb6ae in uv_run ()
#18 0x000055814b8c410f in DepLibUV::RunLoop() ()
#19 0x000055814b8ed381 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) ()
#20 0x000055814b8be813 in mediasoup_worker_run ()
#21 0x000055814b8bc709 in main ()
@gitamirp,
- Which mediasoup version are you using?
- Can you repro? can you provide an STR?
- Please provide as much context info as you can.
@jmillan thanks for looking into it. We are using version 3.12.12. Unfortunately, it happens in a production environment therefore its harder to get detailed logs or STR for it. It rarely happens but I can try to get some more info around it.
You should upgrade to latest release first, 3.12.12 is very old and the issue you are hitting might have been long fixed
Yes please, you're using quite an old version. Upgrade it as soon as you can.
Don't hesitate to reopen the issue if it comes out in the future.
Honestly there is no change since 3.12.12 that could have fix this issue. Let me reopen to not forget. I think we need a fuzzer test for this class.
thanks, fwiw there were no significant changes to RateCalculator.cpp over the last 2 years.
https://github.com/versatica/mediasoup/commits/v3/worker/src/RTC/RateCalculator.cpp
I suspect it might occur when
https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L27
nowMs - this->newestItemStartTime >= this->itemSizeMs
and
https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/RateCalculator.cpp#L32
this->newestItemIndex >= this->windowItems
and
this->oldestItemIndex == 0
probably just logging an error and resting the RateCalculator will be more appropriate error handling
probably just logging an error and resting the RateCalculator will be more appropriate error handling
Defensive programming hides real bugs. We don't do that. If something should never happen then it must never happen, otherwise it's a bug that needs to be fixed.
If there was a memory corruption somewhere, side-effects might be unexpected. This needs to be confirmed on latest version first.
probably just logging an error and resting the RateCalculator will be more appropriate error handling
Defensive programming hides real bugs. We don't do that. If something should never happen then it must never happen, otherwise it's a bug that needs to be fixed.
also true
If there was a memory corruption somewhere, side-effects might be unexpected. This needs to be confirmed on latest version first.
I am not sure how fast we can get these hosts on latest release. for now I will just patch it to suit our needs.
I'll let you know if I find more about this condition.
thanks
@gitamirp any news? I assumed you patched version 3.12.12 so it's not a problem for you anymore but perhaps you updated to latest version (without patching it)?
We patched 3.12.12 by using
MS_ASSERT( this->newestItemIndex != this->oldestItemIndex || this->oldestItemIndex == -1 || this->newestItemIndex, "newest index overlaps with the oldest one");
I will let you know if we run into this ASSERT again with this patch. If you are interested we can add some logs when
this->newestItemIndex == this->oldestItemIndex
I believe that in 3.12.12 both newest && oldest can be zero at the same time but I don't have logs to support it.
Yes please. Add those logs and comment here when you get something. We will investigate this next week. Too busy these days.
Sure, we'll do. thanks