turn icon indicating copy to clipboard operation
turn copied to clipboard

Add callback API to track allocation lifeycle

Open rg0now opened this issue 1 year ago • 8 comments

Fixes https://github.com/pion/turn/issues/324, https://github.com/pion/turn/pull/325, and https://github.com/pion/turn/pull/402

This PR adds a set of callbacks that can be used to track the lifecycle of TURN allocations in the server. This allows user code to be called whenever an authentication request has been processed (this can be used to mitigate DoS attacks against a server, see https://github.com/pion/turn/pull/402), when an allocation has been created/removed (e.g., to implement user tracking for adding quota support, see https://github.com/pion/turn/issues/324), when a permission has been created/removed (e.g., to log the peers a user tries to reach via the server), a channel has been created/removed (this will simplify integrating external TURN accelerators, see https://github.com/pion/turn/pull/360), or when an allocation ends with an error.

The implementation adds a new EventHandlers struct to the ServerConfig, defined as follows:

type EventHandlers struct {
  OnAuth func(srcAddr, dstAddr net.Addr, protocol, username, realm string, method string, verdict bool)
  OnAllocationCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, requestedPort int)
  OnAllocationDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string)
  OnAllocationError func(srcAddr, dstAddr net.Addr, protocol, message string)
  OnPermissionCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.IP)
  OnPermissionDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.IP)
  OnChannelCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.Addr, channelNumber uint16)
  OnChannelDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.Addr, channelNumber uint16)
}

It is OK to handle only a subset of the events.

Caveats:

  • Event handlers are called synchronously, so long-running callbacks will block the server.
  • Theoretically, allocations are allowed to change credentials during their lifetime, in that any Refresh, CreatePermission or ChannelBind message may use a different credential. Since this does not seem to be easy to implement with the PeerConnection API, credential swaps are currently no tracked.

rg0now avatar Nov 07 '24 16:11 rg0now

CC @renandincer

rg0now avatar Nov 07 '24 16:11 rg0now

Codecov Report

:x: Patch coverage is 90.65421% with 10 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 68.64%. Comparing base (a4a8111) to head (af7dfe3). :warning: Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/server/util.go 73.91% 4 Missing and 2 partials :warning:
internal/allocation/five_tuple.go 50.00% 4 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #419      +/-   ##
==========================================
+ Coverage   67.43%   68.64%   +1.21%     
==========================================
  Files          43       43              
  Lines        3083     3183     +100     
==========================================
+ Hits         2079     2185     +106     
+ Misses        843      837       -6     
  Partials      161      161              
Flag Coverage Δ
go 68.64% <90.65%> (+1.21%) :arrow_up:
wasm 25.60% <7.47%> (-0.80%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 07 '24 16:11 codecov[bot]

I just looked at it briefly @rg0now! Thank you for PRing this in.

I think it might be useful to get a handle to the PacketConn from OnAllocatedCreated as well. This might be useful for rate limiting. What do you think?

renandincer avatar Nov 12 '24 04:11 renandincer

I think a better way to handle rate limiting would be to provide a RelayAddressGenerator.AllocatePacketConn to the server that will emit rate-limited packetconns. This works even today. The only issue with that approach is if we need user context for applying rate limiting (i.e., when different users may get different rate limits), since currently we miss the user context in the RelayAddressGenerator interface handlers. But then again, it would somehow be more adequate to add the user context there. Wdyt?

rg0now avatar Nov 12 '24 11:11 rg0now

Agreed. Only issue is that would be a breaking API change.

renandincer avatar Nov 12 '24 13:11 renandincer

While working with the lifecycle event handler API it came to our attention that we never report the allocation's relay address in the callbacks. This makes it difficult to track server port allocation status, open firewalls, etc.

The last patch updates the relevant callbacks to also pass the relay address to the caller. The changed event handler signatures are as follows:

type EventHandlers struct {
        ...
	OnAllocationCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr net.Addr, requestedPort int)
        ...
	OnChannelCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr, peer net.Addr, channelNumber uint16)
	OnChannelDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr, peer net.Addr, channelNumber uint16)
}

rg0now avatar Nov 27 '24 17:11 rg0now

Further evolving the PR based on operational experience: It seems the PermissionCreated/PermissionDeleted callbacks could also use the relay-address so we added these to the API. The changes:

type EventHandlers struct {
        ...
	OnPermissionCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr net.Addr, peer net.IP)
	OnPermissionDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr net.Addr, peer net.IP)
        ...
}

rg0now avatar Dec 05 '24 14:12 rg0now

We have been deploying this PR into production for a while now, so far all goes as expected. If there are no objections, I'd like to merge this ASAP. Please speak up now if you have concerns.

rg0now avatar Dec 16 '24 13:12 rg0now

@JoeTurki Any chance for this to get merged? The code has been merged into STUNner for quite some time now and it has been tested by a sizeable user base, so far without any issues reported. Also there was not much opposition here, except the event dispatcher design and some minor stylistic issues (see comments in #423) that I'm happy to fix.

rg0now avatar Jul 28 '25 13:07 rg0now

@JoeTurki I've rebased the commit, removed the dispatch logic (it was a stupid idea to begin with), and fixed all failing tests. pls merge if you think this is OK.

rg0now avatar Aug 05 '25 07:08 rg0now