Add callback API to track allocation lifeycle
Fixes https://github.com/pion/turn/issues/324, https://github.com/pion/turn/pull/325, and https://github.com/pion/turn/pull/402
This PR adds a set of callbacks that can be used to track the lifecycle of TURN allocations in the server. This allows user code to be called whenever an authentication request has been processed (this can be used to mitigate DoS attacks against a server, see https://github.com/pion/turn/pull/402), when an allocation has been created/removed (e.g., to implement user tracking for adding quota support, see https://github.com/pion/turn/issues/324), when a permission has been created/removed (e.g., to log the peers a user tries to reach via the server), a channel has been created/removed (this will simplify integrating external TURN accelerators, see https://github.com/pion/turn/pull/360), or when an allocation ends with an error.
The implementation adds a new EventHandlers struct to the ServerConfig, defined as follows:
type EventHandlers struct {
OnAuth func(srcAddr, dstAddr net.Addr, protocol, username, realm string, method string, verdict bool)
OnAllocationCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, requestedPort int)
OnAllocationDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string)
OnAllocationError func(srcAddr, dstAddr net.Addr, protocol, message string)
OnPermissionCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.IP)
OnPermissionDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.IP)
OnChannelCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.Addr, channelNumber uint16)
OnChannelDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, peer net.Addr, channelNumber uint16)
}
It is OK to handle only a subset of the events.
Caveats:
- Event handlers are called synchronously, so long-running callbacks will block the server.
- Theoretically, allocations are allowed to change credentials during their lifetime, in that any
Refresh,CreatePermissionorChannelBindmessage may use a different credential. Since this does not seem to be easy to implement with thePeerConnectionAPI, credential swaps are currently no tracked.
CC @renandincer
Codecov Report
:x: Patch coverage is 90.65421% with 10 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 68.64%. Comparing base (a4a8111) to head (af7dfe3).
:warning: Report is 1 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| internal/server/util.go | 73.91% | 4 Missing and 2 partials :warning: |
| internal/allocation/five_tuple.go | 50.00% | 4 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #419 +/- ##
==========================================
+ Coverage 67.43% 68.64% +1.21%
==========================================
Files 43 43
Lines 3083 3183 +100
==========================================
+ Hits 2079 2185 +106
+ Misses 843 837 -6
Partials 161 161
| Flag | Coverage Δ | |
|---|---|---|
| go | 68.64% <90.65%> (+1.21%) |
:arrow_up: |
| wasm | 25.60% <7.47%> (-0.80%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
I just looked at it briefly @rg0now! Thank you for PRing this in.
I think it might be useful to get a handle to the PacketConn from OnAllocatedCreated as well. This might be useful for rate limiting. What do you think?
I think a better way to handle rate limiting would be to provide a RelayAddressGenerator.AllocatePacketConn to the server that will emit rate-limited packetconns. This works even today. The only issue with that approach is if we need user context for applying rate limiting (i.e., when different users may get different rate limits), since currently we miss the user context in the RelayAddressGenerator interface handlers. But then again, it would somehow be more adequate to add the user context there. Wdyt?
Agreed. Only issue is that would be a breaking API change.
While working with the lifecycle event handler API it came to our attention that we never report the allocation's relay address in the callbacks. This makes it difficult to track server port allocation status, open firewalls, etc.
The last patch updates the relevant callbacks to also pass the relay address to the caller. The changed event handler signatures are as follows:
type EventHandlers struct {
...
OnAllocationCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr net.Addr, requestedPort int)
...
OnChannelCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr, peer net.Addr, channelNumber uint16)
OnChannelDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr, peer net.Addr, channelNumber uint16)
}
Further evolving the PR based on operational experience: It seems the PermissionCreated/PermissionDeleted callbacks could also use the relay-address so we added these to the API. The changes:
type EventHandlers struct {
...
OnPermissionCreated func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr net.Addr, peer net.IP)
OnPermissionDeleted func(srcAddr, dstAddr net.Addr, protocol, username, realm string, relayAddr net.Addr, peer net.IP)
...
}
We have been deploying this PR into production for a while now, so far all goes as expected. If there are no objections, I'd like to merge this ASAP. Please speak up now if you have concerns.
@JoeTurki Any chance for this to get merged? The code has been merged into STUNner for quite some time now and it has been tested by a sizeable user base, so far without any issues reported. Also there was not much opposition here, except the event dispatcher design and some minor stylistic issues (see comments in #423) that I'm happy to fix.
@JoeTurki I've rebased the commit, removed the dispatch logic (it was a stupid idea to begin with), and fixed all failing tests. pls merge if you think this is OK.