webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

UDPMux causes massive packet loss

Open jech opened this issue 4 years ago • 11 comments

Testing UDPMux in Galene, I'm seeing absolutely massive packet loss on a local network, on the order of 50-70%.

The code is here: https://github.com/jech/galene/commit/b80e515eb04a8326336524ea80ecf711a3013293

jech avatar Aug 06 '21 16:08 jech

@jech Have you tried to increase OS UDP buffer??

OrlandoCo avatar Aug 08 '21 03:08 OrlandoCo

No. My current hypothesis is that it's the same issue as https://github.com/pion/webrtc/issues/1356, which is apparently due to having multiple local addresses on a single local socket; that's going to happen on double-stack hosts, as well as on multihomed hosts.

jech avatar Aug 08 '21 19:08 jech

@jech we are using this (and TCPMux) with LiveKit and only see packet loss when UDP buffer gets overwhelmed. (increasing it gets rid of the loss for us).

https://github.com/livekit/livekit-server/blob/master/pkg/rtc/config.go#L70

Do you want to see if you can repro with LiveKit? I'm wondering there's something unique to your machine's networking stack.

You can start it with docker, and using UDPMux

docker run --rm \
  -p 7880:7880 \
  -p 7881:7881 \
  -p 7882:7882/udp \
  -e LIVEKIT_KEYS="<key>: <secret>" \
  livekit/livekit-server \
  --dev \
  --node-ip=<machine-ip>

davidzhao avatar Aug 08 '21 22:08 davidzhao

Are your machines double-stack?

jech avatar Aug 09 '21 18:08 jech

what is considered double-stack? having both ipv4/6?

davidzhao avatar Aug 09 '21 20:08 davidzhao

what is considered double-stack? having both ipv4/6?

Yes.

jech avatar Aug 09 '21 21:08 jech

With livekit we are using UDP4 with the mux and that could explain the difference. The challenge with dual-stack is ensuring what's advertised to match the dest addr that we sent to. I remember seeing some oddities along the lines of:

  • pion sends packet to client (udp4):clientport
  • client sends packet to pion, which got interpreted as (udp6):clientport
  • mux ignores udp6 address since it's different from the address that it sent to.

davidzhao avatar Aug 09 '21 21:08 davidzhao

I don't know if the issue is the same as https://github.com/pion/webrtc/issues/1356 (which has higher priority for me), but that issue goes away when I disable IPv6 (see https://github.com/pion/webrtc/issues/1356#issuecomment-894376345). Disabling IPv6 is of course not an option (IPv6 is great for WebRTC, IPv6 gives you a peer-reflexive candidate straight away, without the need to contact a STUN, which noticably reduces the connexion establishment delay).

jech avatar Aug 10 '21 11:08 jech

Disabling IPv6 is of course not an option

I agree having IPv6 is nice, but I would question if it's a must have. Is the slight connection speed improvement worth not having ICE/TCP? that is the decision today.

ofc it'd be ideal to fix the underlying issue.

davidzhao avatar Aug 10 '21 17:08 davidzhao

The workaround is not a simple matter of disabling IPv6 for TCP-ICE — it requieres disabling IPv6 globally on the host. This means that you'll run into trouble as soon as somebody runs your code on a modern server.

What's more, the issue indicates that the code is buggy. Until the bug is understood and properly fixed, there's no saying when the code will bite you. Most probably during an important demo ;-)

jech avatar Aug 10 '21 21:08 jech

It'll only cause an issue on servers that don't support IPv4. we have not gotten any feedback about this. AFAIK, all major cloud vendors run their machines with dual stack.

But I digress, let's just fix the underlying issue.

davidzhao avatar Aug 10 '21 22:08 davidzhao