ice icon indicating copy to clipboard operation
ice copied to clipboard

Allow cancelling TURN Allocate phase, which can block for ~7.8s if TURN server is not responding.

Open smb-z opened this issue 7 months ago • 4 comments

Description

Although Agent.GatherCandidates() have a cancellable context, the turn.Client.Allocate() processing does not use it and can hang for about 7.8 seconds if the TURN server is unreachable. This happens because Allocate() completes only after the ALLOCATE request retry limit is reached. Ultimately, PeerConnection.Close() will block while waiting for unnecessary allocation to complete.

PR fixes this behavior by cancelling ALLOCATE phase if context was cancelled.

Caveat: If cancelled, Client.Allocate will nevertheless return error ("transaction closed") which is logged as warning here.

smb-z avatar Jun 02 '25 13:06 smb-z

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 78.61%. Comparing base (753c2a0) to head (d9f9d1d).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #781      +/-   ##
==========================================
+ Coverage   78.38%   78.61%   +0.22%     
==========================================
  Files          41       41              
  Lines        5441     5448       +7     
==========================================
+ Hits         4265     4283      +18     
+ Misses        945      934      -11     
  Partials      231      231              
Flag Coverage Δ
go 78.61% <100.00%> (+0.22%) :arrow_up:
wasm 27.05% <0.00%> (-0.04%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 02 '25 13:06 codecov[bot]

@JoeTurki Hi! Would you mind having a look at this PR?

smb-z avatar Jun 02 '25 14:06 smb-z

@smb-z Hello, Thank you so much for making this change, it makes sense. I only have a concern about a potential race condition. I'm not sure, I'll test it in a day, If it doesn't have issues, We'll merge it right away, Otherwise, I'll update your PR.

Thank you so much.

JoTurk avatar Jun 03 '25 00:06 JoTurk

Thanks, @JoeTurki! It seems you're right, and cancelling client.Allocate() is more tricky than I first thought. Here are some race issues I've found:

  1. If the context is cancelled just after client.Allocate() successfully completes but before it returns, we'll end up in a state when err here is nil, but the client is closed by the goroutine. I wasn't able to find any significant problems with this, but it looks unreliable, as the code following the error check is expecting Allocate to complete successfully. Thanks, @aalekseevx for pointing this out!
  2. If the context is cancelled before client.Allocate(), it wouldn't have any effect, and we'll still have client.Allocate() blocked for 7.8s.

It looks like a better option is to pass the context to Allocate() and process context cancelling inside the TURN client itself.

smb-z avatar Jun 03 '25 14:06 smb-z