webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

SCTP fails when using SetInterfaceFilter and pion is an answerer

Open iamoskvin opened this issue 6 months ago • 10 comments

Summary

I want to disable all host candidates and use only server-reflexive (srflx) ones. To achieve that, I used SettingEngine.SetInterfaceFilter like this:

settingEngine.SetInterfaceFilter(func(name string) bool {
    return false
})

This works correctly when Pion is an offerer: the ICE connection establishes, and SCTP works as expected.

However, when Pion acts as an answerer, ICE succeeds but SCTP fails in ~80% of cases. The failure happens during SCTP handshake: the incoming COOKIE-ACK never arrives, even though ICE reports "connected".

In about 20% of runs, the full connection including SCTP does succeed — so the behavior is stochastic.

Investigation

Using strace, I observed that:

In normal operation, Pion creates two distinct UDP sockets (separate file descriptors).

When using SetInterfaceFilter (excluding all interfaces), both srflx and host candidates share the same underlying socket (same FD).

I suspect this causes an internal conflict or race condition when routing incoming packets (especially during SCTP setup), particularly for answerer mode.

Environment

Pion version: v4.0.10

Remote peer: Google Chrome 138.0.7204.49

OS: Linux (but should be reproducible cross-platform)

Repro

Use SetInterfaceFilter to exclude all interfaces.

Set up a peer connection where Pion is an answerer.

Wait for Chrome to initiate.

Observe that ICE connects but SCTP often fails.

Expected behavior

Even with a single srflx candidate and one UDP socket, SCTP should succeed reliably when Pion is an answerer.

Additional notes

I suspect a socket muxing issue in ICE or DTLS layers when both host and srflx candidates share a socket (fd).

iamoskvin avatar Jul 11 '25 17:07 iamoskvin

Could you suggest any quick and dirty fix / workaround? While e don't have a nice final solution. Thanks

iamoskvin avatar Jul 12 '25 15:07 iamoskvin

@iamoskvin we need to investigate why it's happening first, if you can provide code to reproduce the issue it will make this easier.

JoTurk avatar Jul 12 '25 15:07 JoTurk

@iamoskvin we need to investigate why it's happening first, if you can provide code to reproduce the issue it will make this easier. I created a repo for this issue. I hope I was able to correctly isolate the issue. Please ask me if anything is unclear. https://github.com/iamoskvin/pion_issue_3176/ @JoeTurki

iamoskvin avatar Jul 12 '25 21:07 iamoskvin

@JoeTurki did you have time to look at this issue? Is it reproducible? Do you have any ideas about a workaround? Thanks.

iamoskvin avatar Jul 16 '25 14:07 iamoskvin

@iamoskvin it's in my todo list to try to fix it, this weekend

JoTurk avatar Jul 16 '25 14:07 JoTurk

@JoeTurki any updates? maybe some insights at least? Sorry for frequent disturbance.

iamoskvin avatar Jul 21 '25 10:07 iamoskvin

@iamoskvin I'm working on it, I couldn't track the cause issue yet, It also happens for me rarely ~5%. It's getting fixed soon.

JoTurk avatar Jul 21 '25 13:07 JoTurk

@JoeTurki were you able to determine the cause of the issue?

iamoskvin avatar Jul 25 '25 13:07 iamoskvin

I think the the problem is that we don't propagate the filter correctly, so i think we end up with a demux race?, it wasn't intended for this use, We need to filter out host candidates internally, Or provide a simpler filter for this use.

For now, I suggest you use SetNAT1To1IPs if you know the IPs. and we'll try to fix this issue soon.

JoTurk avatar Jul 27 '25 11:07 JoTurk

For now, I suggest you use SetNAT1To1IPs if you know the IPs. and we'll try to fix this issue soon.

Thanks. It looks like this workaround does not work. I can remove STUN server and replace it with this setting. It works but if I am filtering out host candidates, then I see the same issue as before.

iamoskvin avatar Jul 27 '25 12:07 iamoskvin