webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

Assert that OnICEConnectionStateChange doesn't incorrectly report disconnected

Open gqgs opened this issue 5 years ago • 9 comments

Your environment.

  • Version: 20f2d1899ebbbdb3879444a03a7dc956a3b51126
  • Browser: Chrome 78.0.3904.97
  • Other Information - running Pion in a remote docker container.

What did you do?

Established a connection with a remote server.

What did you expect?

The peer to stay connected with the server until it decides to close the connection.

What happened?

On Chrome, the connection is being closed (going to a disconnected state) 30 seconds after being established.

After some debugging it seems this method is responsible for closing the connection: https://github.com/pion/ice/blob/37bfc6f4d997e5a6d9e836fd9607b9c9aefbc2b6/agent.go#L769

Some lines that appear to be relevant from the debug log:

ice INFO: 2020/02/04 13:25:33 Setting new connection state: Checking
ice DEBUG: 13:25:33.768620 agent.go:1143: adding a new peer-reflexive candidate: $remote:43116
ice TRACE: 13:25:33.816489 agent.go:639: Set selected candidate pair: prio 7998392936314175487 (local, prio 2130706431) host $local:39823 <-> prflx $remote:43116 related :0 (remote, prio 1862270975)
ice INFO: 2020/02/04 13:25:33 Setting new connection state: Connected
ice DEBUG: 13:25:34.015233 agent.go:1143: adding a new peer-reflexive candidate: $remote:1619 
ice DEBUG: 13:25:34.253331 agent.go:1143: adding a new peer-reflexive candidate: $remote:48095
ice DEBUG: 13:25:34.446345 agent.go:1143: adding a new peer-reflexive candidate: $remote:42324
....
ice INFO: 2020/02/04 13:26:05 Setting new connection state: Disconnected
ice INFO: 2020/02/04 13:26:05 Setting new connection state: Closed

In this example, after the call to adding a new peer-reflexive candidate: $remote:1619, I see no more traffic coming from $remote:43116 but I still see traffic coming from the other candidates. The selected candidate pair candidate pair doesn't change though which results in the connection timing out.

While the connection is established I can receive the audio/video tracks in the browser as expected.

For some reason Firefox doesn't seem to have the same problem. Albeit I'm seeing a single adding a new peer-reflexive candidate in that case :thinking:

Any suggestions to further debug this?

gqgs avatar Feb 04 '20 14:02 gqgs

Apparently our handling of the Disconnected state was incorrect. The connection can still be reestablished after moving to that state. There is still a disconnect the first time the client connects but I suppose that's expected for some reason?

I'm still wondering what would be the proper way of detecting a dead connection in this case since apparently we can't rely on the ICE connection state for that.

gqgs avatar Feb 04 '20 17:02 gqgs

We have just encountered the same problem and after A LOT of debugging, we found this case happens when multiple ICE candidate pairs are marked as valid. One of them goes to disconnected or failed, and because we hooked OnConnectionStateChange (webrtc.ICEConnectionStateDisconnected) to our cleanup function, the connection was torn down.

Is this a bug or just the weird/bad WebRTC API causing problems?

LanderN avatar Jun 11 '20 14:06 LanderN

Oof sorry about that @LanderN

Let me write a test that asserts that Disconnected only gets fired when ALL pairs are disconnected. When fully nominated/selected will also confirm that only that candidate pair effects the state. There is also the 'Fatal' state, when the ICE Agent is completely dead. We don't implement that yet https://github.com/pion/ice/issues/189

Disconnected could be temporary pion/webrtc will try to reconnect so maybe I wouldn't hang up right away!

Thanks for reporting the bug, and will get that fixed this weekend.

Sean-Der avatar Jun 13 '20 08:06 Sean-Der

Hey Sean, that sounds great!

Reading the linked issue and the discussion around it, would you suggest a manual call to peerConnection.Close() after the ConnectionState has gone to failed or would that be unnecessary in the proposed implementation? I do have some other state attached to each connection that I would like to clean up when possible, I assume this should be dealt with on a change to closed?

LanderN avatar Jun 13 '20 19:06 LanderN

Sorry I haven't dealt with this yet :( I just haven't had the bandwidth.

If people are interested in contributing a simple test would be great here! Using vnet establish two valid candidate pairs, but then cause one to go to disconnected.

You should assert that webrtc.ICEConnectionStateDisconnected is not fired. If that test fails we can go from there, should be easy enough to fix :)

Sean-Der avatar Oct 04 '20 04:10 Sean-Der

@rahulnakre @scorpionknifes if either of you are interested I think this would be a fun issue to look into! Even if this bug doesn't happen anymore this would be a great test to have :)

Sean-Der avatar Oct 07 '20 17:10 Sean-Der

hey, I'll take a look when I get the chance, thanks!

rahulnakre avatar Oct 08 '20 03:10 rahulnakre

Hey, I just discovered the project, is there a chance that I can try to pick this up?

wtsiamruk avatar Apr 29 '22 15:04 wtsiamruk

Go for it @wtsiamruk! Sean is super friendly and open to community participation! Given that no one else has picked this one up for over a year I think you're good to give it a go.

boushley avatar May 01 '22 03:05 boushley