webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

Deadlock calling SCTPTransport Stop

Open tab1293 opened this issue 1 year ago • 2 comments

Your environment.

  • Version: v3.1.7

What did you do?

SCTPTransport's Stop method was called in ion-sfu relay peer's Close method.

What did you expect?

The underlying SCTPTransport to be stopped in a timely manner.

What happened?

The call to Stop seems to have caused a deadlock; the method never returns. Here is the relevant block trace from pprof:

108214 2 @ 0x7cb6f0 0x7cb6df 0x481102 0x7cb645 0x7cb5f5 0x7f7dc2 0x7f30c5 0x80b128 0x80aaf8 0x8ecc78 0x90cc88 0x18edbca 0x476a61
#	0x7cb6ef	sync.(*Mutex).Lock+0x6f						/usr/local/go/src/sync/mutex.go:90
#	0x7cb6de	github.com/pion/transport/connctx.(*connCtx).Close.func1+0x5e	/go/pkg/mod/github.com/pion/[email protected]/connctx/connctx.go:154
#	0x481101	sync.(*Once).doSlow+0xc1					/usr/local/go/src/sync/once.go:74
#	0x7cb644	sync.(*Once).Do+0x64						/usr/local/go/src/sync/once.go:65
#	0x7cb5f4	github.com/pion/transport/connctx.(*connCtx).Close+0x14		/go/pkg/mod/github.com/pion/[email protected]/connctx/connctx.go:152
#	0x7f7dc1	github.com/pion/dtls/v2.(*Conn).close+0x1a1			/go/pkg/mod/github.com/pion/dtls/[email protected]/conn.go:921
#	0x7f30c4	github.com/pion/dtls/v2.(*Conn).Close+0x24			/go/pkg/mod/github.com/pion/dtls/[email protected]/conn.go:341
#	0x80b127	github.com/pion/sctp.(*Association).close+0xc7			/go/pkg/mod/github.com/pion/[email protected]/association.go:463
#	0x80aaf7	github.com/pion/sctp.(*Association).Close+0xd7			/go/pkg/mod/github.com/pion/[email protected]/association.go:443
#	0x8ecc77	github.com/pion/webrtc/v3.(*SCTPTransport).Stop+0x77		/go/pkg/mod/github.com/pion/webrtc/[email protected]/sctptransport.go:130
#	0x90cc87	github.com/pion/ion-sfu/pkg/relay.(*Peer).Close+0x287		/go/pkg/mod/github.com/playback-sports/[email protected]/pkg/relay/relay.go:336
#	0x18edbc9	github.com/playback-sports/relayer/pkg.(*Relayer).Start+0x9e9	/build/pkg/relayer.go:256

This happens very rarely but figured I'd file a report anyway.

tab1293 avatar Dec 06 '22 23:12 tab1293

I am experiencing similar problem, might be realted to this. Although the issue existed even before (very rarely) after recent changes it is happening more often. I tracked it down to tcp-mux and this commit. There was also earlier commit that attempted to mitigate known problem about blocking channel, though did not fix the behavior.

m1k1o avatar Dec 07 '22 18:12 m1k1o

@tab1293 I was able to reproduce my issue in tcp-mux by turning on airplane mode, so the connection is left open and Write function is blocking until its available. I was able to fix this by adding WriteBufferSize to `NewTCPMuxDefault. Could you try it with your issue (if those steps apply, if you even use TCP mux in the first place) if thats maybe the same problem?

m1k1o avatar Dec 13 '22 00:12 m1k1o

Nothing actionable on this one unfortunately. Sorry this didn't get addressed when you filed it originally @tab1293

I assume that LiveKit picked this up in all their TCP Mux work and hasn't been an issue for a while. If I see any patterns/reports I will come back to this though!

Sean-Der avatar Apr 01 '24 02:04 Sean-Der