Allow cancelling TURN Allocate phase, which can block for ~7.8s if TURN server is not responding.
Description
Although Agent.GatherCandidates() have a cancellable context, the turn.Client.Allocate() processing does not use it and can hang for about 7.8 seconds if the TURN server is unreachable. This happens because Allocate() completes only after the ALLOCATE request retry limit is reached. Ultimately, PeerConnection.Close() will block while waiting for unnecessary allocation to complete.
PR fixes this behavior by cancelling ALLOCATE phase if context was cancelled.
Caveat: If cancelled, Client.Allocate will nevertheless return error ("transaction closed") which is logged as warning here.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 78.61%. Comparing base (
753c2a0) to head (d9f9d1d).
Additional details and impacted files
@@ Coverage Diff @@
## master #781 +/- ##
==========================================
+ Coverage 78.38% 78.61% +0.22%
==========================================
Files 41 41
Lines 5441 5448 +7
==========================================
+ Hits 4265 4283 +18
+ Misses 945 934 -11
Partials 231 231
| Flag | Coverage Δ | |
|---|---|---|
| go | 78.61% <100.00%> (+0.22%) |
:arrow_up: |
| wasm | 27.05% <0.00%> (-0.04%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@JoeTurki Hi! Would you mind having a look at this PR?
@smb-z Hello, Thank you so much for making this change, it makes sense. I only have a concern about a potential race condition. I'm not sure, I'll test it in a day, If it doesn't have issues, We'll merge it right away, Otherwise, I'll update your PR.
Thank you so much.
Thanks, @JoeTurki! It seems you're right, and cancelling client.Allocate() is more tricky than I first thought. Here are some race issues I've found:
- If the context is cancelled just after client.Allocate() successfully completes but before it returns, we'll end up in a state when err here is nil, but the client is closed by the goroutine. I wasn't able to find any significant problems with this, but it looks unreliable, as the code following the error check is expecting Allocate to complete successfully. Thanks, @aalekseevx for pointing this out!
- If the context is cancelled before client.Allocate(), it wouldn't have any effect, and we'll still have client.Allocate() blocked for 7.8s.
It looks like a better option is to pass the context to Allocate() and process context cancelling inside the TURN client itself.