ice icon indicating copy to clipboard operation
ice copied to clipboard

Panic when adding candidate

Open m1k1o opened this issue 2 years ago • 5 comments

This panic is fairly rare, but in certain (yet unknown) circumstances it is more prevalent.

panic: runtime error: invalid memory address or nil pointer dereference	
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xe1758f]	

goroutine 6592 [running]:	
github.com/pion/ice/v2.(*Agent).addCandidate.func1({0xc0011adf90?, 0xc0011adf48?}, 0xc000004300?)	
	/go/pkg/mod/github.com/pion/ice/[email protected]/agent.go:798 +0x26f	
github.com/pion/ice/v2.(*Agent).taskLoop(0xc000004300)	
	/go/pkg/mod/github.com/pion/ice/[email protected]/agent.go:240 +0xc2	
created by github.com/pion/ice/v2.NewAgent	
	/go/pkg/mod/github.com/pion/ice/[email protected]/agent.go:363 +0xd92

m1k1o avatar Jun 26 '23 19:06 m1k1o

@m1k1o It looks like candidateConn is nil? https://github.com/pion/ice/blob/master/agent.go#L817C1-L818C1

I think it would be worth adding a defensive if candidateConn != nil check

It would be nice to figure out why this is happening though! Would it be possible for you to deploy some more logging + a check around this?

  • What candidates had two duplicates? IP/Port/Type etc..

Sean-Der avatar Jun 27 '23 20:06 Sean-Der

From my findings, I see that right before panic, the code is frozen for a couple of seconds (up to 10). Probably because of some TCP connection blocking when write buffer is full. When it finally unblocks, it is flooded with all messages at once and most likely ocurrs some race condition that leads to panic.

Last message seems to be Ignore duplicate candidate: tcp4 host <ip>:0, where the IP is NAT1TO1 of the server. The candidate is announced as candidate:4225724578 1 tcp 1675624447 <ip> 0 typ host tcptype active. I saw recent changes in the code about Active ICE TCP Candidates, but this is v2.3.4 what should predate those changes. I might try to upgrade to latest version to see if it happens again.

m1k1o avatar Jun 27 '23 21:06 m1k1o

That is a strange one! I think we might be able to reproduce this though. Can you create a TCP client that doesn't disconnect properly. Maybe as simple as connecting from phone then turning on Airplane mode?

I am very confident that Active ICE TCP changes will not have fixed this. This is the passive code.

Sean-Der avatar Jun 28 '23 03:06 Sean-Der

Simply connecting using TCP on mobile and truning on Airplane Mode did not cut it:

6:09PM WRN write error: write tcp 172.17.0.3:52100->192.168.1.186:58735: use of closed network connection module=webrtc submodule=pion subsystem=ice-tcp
6:09PM INF error reading streaming packet: read tcp 172.17.0.3:52100->192.168.1.186:58735: use of closed network connection module=webrtc submodule=pion subsystem=ice-tcp
6:09PM WRN error closing connection: close tcp 172.17.0.3:52100->192.168.1.186:58735: use of closed network connection module=webrtc submodule=pion subsystem=ice-tcp
6:09PM WRN Failed to read from candidate tcp4 host 192.168.1.38:52100: io: read/write on closed pipe module=webrtc peer_id=1 session_id=user submodule=pion subsystem=ice
6:09PM WRN Failed to discover mDNS candidate 14e7f6f2-8a05-4212-8024-e43a6032d72a.local: mDNS: connection is closed module=webrtc peer_id=1 session_id=user submodule=pion subsystem=ice

image

I traced it back to this commit, that added connConfigs = append(connConfigs, connConfig{nil, 0, TCPTypeActive}), where nil was passed down to addCandidate.

image

It was since then reverted by and changed by you. Therefore I believe, upgrading to the latest should fix the issue.

Seems like it was not reproducible because under usual circumstances it would be first entry in candidates list, and therefore not marked as duplicate. Only if my client had 2 IP addresses, and the second one should be used, it panicked reproducibly.

m1k1o avatar Jul 12 '23 18:07 m1k1o

After upgrading pion/ice to v2.3.9 i can confirm, that it does not panic anymore.

But actually I am not able to connect using UDPMux or TCPMux (with NAT1TO1). Only using ephemeral UDP the connection is possible. It has something to to again with the fact, that my server has 2 interfaces (first for the Internet, other for Management) and I am connecting from Management. After disabling the other Interface (or removing NAT1TO1), it works like a charm.

This is issue on its own, so maybe it should be extracted and this one closed.

m1k1o avatar Jul 12 '23 19:07 m1k1o

Closing, no work to be done here!

Sean-Der avatar Aug 15 '24 19:08 Sean-Der