ice
ice copied to clipboard
Close 'closed channel' cause panic
Your environment.
- Version: pion/ice/[email protected]
- Browser: Chrome newest version
- Other Information - using simulcast feature
What did you do?
I'm trying to re-handshake peerconnection between pion and chrome continuously
What happened?
I think the root cause is gatherCandidateDone channel was closed before but still to do it again.
Interesting. Race here? https://github.com/pion/ice/blob/master/gather.go#L71
Interesting. Race here? https://github.com/pion/ice/blob/master/gather.go#L71
I think race here.
My hot fix:

Bug happen when i try to peer with ~200 clients, i got this error 2-3 times
@jech May i do this, i'm testing allday and no panic happen.
I don't think that's correct. If gatherCandidates is being called twice, your code would only close the channel once, leading to a hung goroutine somewhere. We need to understand the cause, not try to paper over a symptom.
Could you please try this? (It's not wholly correct, but it should fix your immediate issue. We really should clean up the locking in gather.go, it's way too fine-grained, and I'm fairly sure it deadlocks at times.)
DO NOT COMMIT This is not correct.
diff --git a/agent.go b/agent.go
index 54728a2..73d9f10 100644
--- a/agent.go
+++ b/agent.go
@@ -113,6 +113,7 @@ type Agent struct {
taskLoopDone chan struct{}
err atomicError
+ muGatherCandidate sync.Mutex
gatherCandidateCancel func()
gatherCandidateDone chan struct{}
diff --git a/gather.go b/gather.go
index 15536ce..6119160 100644
--- a/gather.go
+++ b/gather.go
@@ -60,20 +60,25 @@ func (a *Agent) GatherCandidates() error {
var gatherErr error
if runErr := a.run(a.context(), func(ctx context.Context, agent *Agent) {
+ a.muGatherCandidate.Lock()
if a.gatheringState != GatheringStateNew {
gatherErr = ErrMultipleGatherAttempted
+ a.muGatherCandidate.Unlock()
return
} else if a.onCandidateHdlr.Load() == nil {
gatherErr = ErrNoOnCandidateHandler
+ a.muGatherCandidate.Unlock()
return
}
- a.gatherCandidateCancel() // Cancel previous gathering routine
- ctx, cancel := context.WithCancel(ctx)
- a.gatherCandidateCancel = cancel
- a.gatherCandidateDone = make(chan struct{})
-
- go a.gatherCandidates(ctx)
+ go func() {
+ defer a.muGatherCandidate.Unlock()
+ a.gatherCandidateCancel()
+ ctx, cancel := context.WithCancel(ctx)
+ a.gatherCandidateCancel = cancel
+ a.gatherCandidateDone = make(chan struct{})
+ a.gatherCandidates(ctx)
+ }()
}); runErr != nil {
return runErr
}
``
I am facing this very rare panic. Going to put the text here so that it's indexable for search:
panic: close of closed channel
goroutine 47164 [running]:
github.com/pion/ice/v2.(*Agent).gatherCandidates(0xc00153cf00, {0x14740a8?, 0xc002e8d180})
/go/pkg/mod/github.com/pion/ice/[email protected]/gather.go:112 +0x58e
created by github.com/pion/ice/v2.(*Agent).GatherCandidates.func1
/go/pkg/mod/github.com/pion/ice/[email protected]/gather.go:56 +0x1db
My assumption is, that canceling previous gathering does not happen synchronously. https://github.com/pion/ice/blob/be72bc709bb290aba9bdc842988d2f893300d967/gather.go#L51-L54
If it takes a little bit longer, it can close newly created gatherCandidateDone channel.
https://github.com/pion/ice/blob/be72bc709bb290aba9bdc842988d2f893300d967/gather.go#L64
We need to make sure previous gathering completed before starting new.