SCTP fails when using SetInterfaceFilter and pion is an answerer
Summary
I want to disable all host candidates and use only server-reflexive (srflx) ones. To achieve that, I used SettingEngine.SetInterfaceFilter like this:
settingEngine.SetInterfaceFilter(func(name string) bool {
return false
})
This works correctly when Pion is an offerer: the ICE connection establishes, and SCTP works as expected.
However, when Pion acts as an answerer, ICE succeeds but SCTP fails in ~80% of cases. The failure happens during SCTP handshake: the incoming COOKIE-ACK never arrives, even though ICE reports "connected".
In about 20% of runs, the full connection including SCTP does succeed — so the behavior is stochastic.
Investigation
Using strace, I observed that:
In normal operation, Pion creates two distinct UDP sockets (separate file descriptors).
When using SetInterfaceFilter (excluding all interfaces), both srflx and host candidates share the same underlying socket (same FD).
I suspect this causes an internal conflict or race condition when routing incoming packets (especially during SCTP setup), particularly for answerer mode.
Environment
Pion version: v4.0.10
Remote peer: Google Chrome 138.0.7204.49
OS: Linux (but should be reproducible cross-platform)
Repro
Use SetInterfaceFilter to exclude all interfaces.
Set up a peer connection where Pion is an answerer.
Wait for Chrome to initiate.
Observe that ICE connects but SCTP often fails.
Expected behavior
Even with a single srflx candidate and one UDP socket, SCTP should succeed reliably when Pion is an answerer.
Additional notes
I suspect a socket muxing issue in ICE or DTLS layers when both host and srflx candidates share a socket (fd).
Could you suggest any quick and dirty fix / workaround? While e don't have a nice final solution. Thanks
@iamoskvin we need to investigate why it's happening first, if you can provide code to reproduce the issue it will make this easier.
@iamoskvin we need to investigate why it's happening first, if you can provide code to reproduce the issue it will make this easier. I created a repo for this issue. I hope I was able to correctly isolate the issue. Please ask me if anything is unclear. https://github.com/iamoskvin/pion_issue_3176/ @JoeTurki
@JoeTurki did you have time to look at this issue? Is it reproducible? Do you have any ideas about a workaround? Thanks.
@iamoskvin it's in my todo list to try to fix it, this weekend
@JoeTurki any updates? maybe some insights at least? Sorry for frequent disturbance.
@iamoskvin I'm working on it, I couldn't track the cause issue yet, It also happens for me rarely ~5%. It's getting fixed soon.
@JoeTurki were you able to determine the cause of the issue?
I think the the problem is that we don't propagate the filter correctly, so i think we end up with a demux race?, it wasn't intended for this use, We need to filter out host candidates internally, Or provide a simpler filter for this use.
For now, I suggest you use SetNAT1To1IPs if you know the IPs. and we'll try to fix this issue soon.
For now, I suggest you use
SetNAT1To1IPsif you know the IPs. and we'll try to fix this issue soon.
Thanks. It looks like this workaround does not work. I can remove STUN server and replace it with this setting. It works but if I am filtering out host candidates, then I see the same issue as before.