coturn icon indicating copy to clipboard operation
coturn copied to clipboard

Rate-limit 401 Unauthorized responses to prevent abuse/reflection baesd attacks

Open e-lisa opened this issue 1 year ago • 23 comments

Problem: Attackers are using Coturn's 401 Unauthorized responses with spoofed UDP packets to create a ~2:1 amplification/reflection attack. A 62 byte request will be met with Coturn’s 401 Unauthorized response which is 150 bytes, a factor of ~2.42.

These attacks hurt the performance of Coturn servers as well as their their reputation.

Tickets reporting bulk 401 responses in their logs:

  • https://github.com/coturn/coturn/issues/1470
  • https://github.com/coturn/coturn/issues/603
  • https://github.com/coturn/coturn/issues/626#issuecomment-1227819805
  • https://github.com/coturn/coturn/issues/737 # Not 401 in this case, but would be if authentication was required
  • https://github.com/coturn/coturn/issues/728 # Related to spoofing/traffic generation

Reports of potential 401 response based reflection attacks in the wild:

  • https://help.nextcloud.com/t/prevent-reflection-abuse-against-coturn/148699
  • https://groups.google.com/g/kurento/c/II0O0g8VplE?pli=1

Related issue:

  • https://github.com/coturn/coturn/issues/871

Steps to reproduce:

  1. For ease of use testing (non-spoofed packets) you can use bin/turnutils_uclient with this config file to generate 401 errors for the COTURN server with the following configuration:
verbose 
listening-port=3478 
realm=example.com 
lt-cred-mech 
user=testuser:testpassword 
fingerprint
  1. Capture 401 unauthorized requests in a .pcap file with tcpdump, wireshark or packet capture tool of your choice.
  2. Edit the source address of the client request to the address you wish to spoof.
  3. Replay the traffic with UDPReplay or tool of choice.
  4. Watch the traffic as the Coturn server replies to the spoofed address.

Solution:

Added rate-limiting to 401 Unauthorized responses to prevent abuse of the server for use in DDoS attacks via traffic reflection and amplification. This should be the default behavior of Coturn to prevent abuse. An option has been added to disable this feature for debugging.

This patch works by counting the amount of requests that result in a 401 Unauthorized response and limiting them by IP Address if the occurred in a specified window of time.

~~ur_addr_map* functions were extended and wrapped to enable ioa_addr objects with and without port numbers. In our use-case we need to be able use the *no_port variants when working with ioa_addr types.~~

Incoming port numbers are ignored by setting the port to 0 before storing a copy of the ioa_addr object.

Added new command-line options:

--ratelimit-401-requests - Sets the amount of requests that result a 401 response per rate-limit window. If set to 0 disables 401 rate limiting.
--ratelimit-401-window - Sets the size in seconds of the rate-limit window.

e-lisa avatar Oct 22 '24 16:10 e-lisa

If I understood the patch correctly, I doubt that it is able to prevent the mentioned DDoS - who says, that a "smart" attacker will always use the same port? Furthermore if there is a Distributed attack, IMHO it is even counterproductive. Also a whole organization/company may sit behind a single firewall, so defaults are probably not reasonable. Last but not least option names are far too long, should be shortened and the request number limit (0 | > 0) should be used to decide, whether to apply a limit or not (the intended option --no-ratelimit-401 is redundant) - not, i.e. req-limit=0 sounds reasonable to me.

Lets remember that in this situation the diagram of the attack would look as follows:

Attacker's Spoofed UDP Packet -> Coturn -> Victim

This patch is meant to mitigate coturn's ability to be used to launch DDoS reflection/amplification attacks (not protect the server from a DDoS). By doing this it allows those running coturn to be good netizens by preventing their servers from being used in a malicious way.

Let me try to clarify things a little bit and elaborate on what we're trying to do here:

If I understood the patch correctly, I doubt that it is able to prevent the mentioned DDoS

In my opinion it will because:

a) The UDP packets are spoofed on behalf of the victim, the 401 response is reflected at the victim b) By rate limiting the amount of packets (IE: 401 responses in this case) we are mitigating this type of abuse by preventing the attacker from continuing to reflect unlimited traffic at their targets.

who says, that a "smart" attacker will always use the same port?

This patch does not use port numbers, but rather the IP address of the source of the UDP packet. This patch does not use the port. Also note the *_no_port versions of the map functions added to facilitate these comparisons. Additionally source ports for UDP are randomized so using them for this would be pointless.

Furthermore if there is a Distributed attack, IMHO it is even counterproductive

I assume you have made the D bold here to suggest that the source addresses of the UDP packets would be distributed, however in this case the UDP packet source is spoofed. As far as the coturn server sees, there is only one source address (even if the attack is distributed from multiple sources).

This attack would be distributed by using multiple coturn servers to attack a single target (not multiple attackers attacking the coturn service, but rather using it to reflect the 401 requests at a victim).

Please also consider that we are not rate limiting all traffic. Only 401 Unauthorized responses. By doing this we should mitigate any unintentional abuse (For example spoofing a target to get them banned/rate limited).

Also a whole organization/company may sit behind a single firewall, so defaults are probably not reasonable

Understood. What do you think better defaults would be? Remember this code only runs when a 401 response is sent. Do we believe there would be more than 100 unauthorized responses in a 60 second period of time. That would be a 401 unauthorized responses every .6 seconds, sustained for a full 60 seconds straight.

It should also be noted that we're currently working on an allowlist that would exempt IP addresses from this rate limit. However in my opinion this should be broken up into a second patch (which I will submit when completed).

Last but not least option names are far too long, should be shortened and the request number limit (0 | > 0) should be used to decide, whether to apply a limit or not (the intended option --no-ratelimit-401 is redundant) - not, i.e. req-limit=0 sounds reasonable to me.

This makes sense to me. I see no problem making these changes! As far as shortened, what do you think of: --401-window and --401-req-limit?

Finally for now I think, using tools like fail2ban might be the better option to apply rate limits if one is convinced, that this changes anything or helps somehow.

A few things here:

a) fail2ban is not really designed to mitigate (D)DoS style attacks, as it periodically reads the logs b) In an attack situation where the logs are filling up (as detailed in the tickets linked above) fail2ban will cause more stress to the system as it process 10s of 1000s of additional lines of logs c) Using fail2ban would ban ALL traffic from a source IP address. This could be used to DoS attack a victim by banning them from the coturn server. This patch only rate limits responding with 401 Unauthorized responses, not all traffic. d) It would be best practice for coturn to deny this tool for use by DDoS for hire groups/tools/etc. As we've seen with NTPd, these types of abuses are only taken away from malicious actors (ie: DDoS for hire) if the software shuts these abused features down by default

In closing I will try to address your code review as soon as possible. Thank you for taking the time to properly review the code!

e-lisa avatar Oct 31 '24 22:10 e-lisa

For an immediate small fix I suggest using --no-software-attribute when running coturn which will reduce the size of the packets (non-data). It will not solve the problem but will make the exploit less useful for the attacker (reduce the amplification factor).

For some reason this config is an opt-in and not enabled by default....

eakraly avatar Nov 13 '24 22:11 eakraly

Any news on this?

Today there were several hundrets of coturn servers used for an amplification attack on hosts of a french hosting company. One of our correctly configured servers has been used for this aswell. Our German hosting provider acknowledged that hundrets of servers in their ip space were used for this attack and that they had to enable a manual mittigation.

CoTURN is behaving correctly in rejecting unauthorized requests. But this enables amplification attacks if spoofed UDP packets are used...

lordwebbie avatar May 28 '25 12:05 lordwebbie

Hello everyone.... My coturn server was also part of this attack mentioned by @lordwebbie . Please prioritize this PR as the whole world would benefit from this. Thank you.

alnagar avatar May 28 '25 14:05 alnagar

Same here on hetzner, after years of running quietly.

Right now mitigating with blocking various malicous netblocks and with

no-rfc5780
no-stun-backward-compatibility
response-origin-only-with-rfc5780
no-software-attribute

parameters set. But yeah, this PR would definitely help.

xadhoom avatar May 28 '25 14:05 xadhoom

Same here on hetzner, after years of running quietly.

Right now mitigating with blocking various malicous netblocks and with

no-rfc5780
no-stun-backward-compatibility
response-origin-only-with-rfc5780
no-software-attribute

parameters set. But yeah, this PR would definitely help.

Same here. Thanks for this. 🫶 Subscribed to see progress on this.

ronilaukkarinen avatar May 28 '25 15:05 ronilaukkarinen

@e-lisa Thank you very much for working on this. A couple of questions/clarifications before we try to make more changes:

Could this attack also occur with STUN messages (Binding Requests) that don’t require authentication and therefore don’t trigger 401 responses?

It seems the attack might work without needing to spoof IP addresses—simply by exploiting STUN backward compatibility. Specifically, there's an attribute in the messages that allows the response to be sent to an arbitrary address: https://datatracker.ietf.org/doc/html/rfc3489#section-11.2.2

The recommendations to disable certain options (which, as far as I know, are disabled by default in the config file) seem like an effective countermeasure 👏

ggarber avatar May 28 '25 17:05 ggarber

This PR came from Wire's attempt to mitigate abuses last year (we have been seeing them also for quite a while). We are running a fork of coturn with this PR included in production on Hetzner, and it successfully mitigated this specific attack on the 27th. But we are not 100% satisfied with the approach as well. On top of this PR, an allow-list was added after seeing the impact the rate-limiting had in production. But the allow-list is also not good enough for our use case, at least being static. So we see this approach more as a bandaid fix that also comes with some downsides.

On your questions @ggarber - since we authenticate the binding requests, and these were specifically targeted by the attacks, we worked on those. But probably unauthenticated ones are also affected.

We have set the following options on our coturns, but the amplification factor of roughly 2.5x is still worth it for the attackers.

no-stun-backward-compatibility
secure-stun
no-rfc5780

Also, there is another kind of attack that is not caught by the rate limiting. Attackers are also targeting whole ranges of IP addresses (and not all ports on a given IP address, like in the latest Hetzner incident).

mastaab avatar May 29 '25 08:05 mastaab

As the developer of this patch, I am still looking at this and am open to suggestions. I will try to merge this branch with the latest code next week so this fix does not become too bitrot as I do think many people can still benfit by running this fix.

At the end of the day people are abusing couturn, and anything we can do to stop it is a win for all of us.

@mastaab - I think to ratelimit entire netblocks, would be a separate feature, however I don't think it is an unreasonable idea, but probably would build on this work (but not block it). I also think this feature would need its own settings as the you would not want to apply the same rules for a single IP to an entire netblock (or vice versa)

e-lisa avatar May 29 '25 18:05 e-lisa

Sorry if I am being absolutely naive: would it help using a non standard port and in an obscure subdomain?

stun.mydomain.com:3478 vs notudp.mydomain.com:4546

n3storm avatar May 31 '25 12:05 n3storm

What about Crowdsec to prevent abuse?

What are the patterns that should be looked for at syslog?

If I understood the patch correctly, I doubt that it is able to prevent the mentioned DDoS - who says, that a "smart" attacker will always use the same port? Furthermore if there is a Distributed attack, IMHO it is even counterproductive. Also a whole organization/company may sit behind a single firewall, so defaults are probably not reasonable. Last but not least option names are far too long, should be shortened and the request number limit (0 | > 0) should be used to decide, whether to apply a limit or not (the intended option --no-ratelimit-401 is redundant) - not, i.e. req-limit=0 sounds reasonable to me.

Lets remember that in this situation the diagram of the attack would look as follows:

Attacker's Spoofed UDP Packet -> Coturn -> Victim

This patch is meant to mitigate coturn's ability to be used to launch DDoS reflection/amplification attacks (not protect the server from a DDoS). By doing this it allows those running coturn to be good netizens by preventing their servers from being used in a malicious way.

Let me try to clarify things a little bit and elaborate on what we're trying to do here:

If I understood the patch correctly, I doubt that it is able to prevent the mentioned DDoS

In my opinion it will because:

a) The UDP packets are spoofed on behalf of the victim, the 401 response is reflected at the victim b) By rate limiting the amount of packets (IE: 401 responses in this case) we are mitigating this type of abuse by preventing the attacker from continuing to reflect unlimited traffic at their targets.

who says, that a "smart" attacker will always use the same port?

This patch does not use port numbers, but rather the IP address of the source of the UDP packet. This patch does not use the port. Also note the *_no_port versions of the map functions added to facilitate these comparisons. Additionally source ports for UDP are randomized so using them for this would be pointless.

Furthermore if there is a Distributed attack, IMHO it is even counterproductive

I assume you have made the D bold here to suggest that the source addresses of the UDP packets would be distributed, however in this case the UDP packet source is spoofed. As far as the coturn server sees, there is only one source address (even if the attack is distributed from multiple sources).

This attack would be distributed by using multiple coturn servers to attack a single target (not multiple attackers attacking the coturn service, but rather using it to reflect the 401 requests at a victim).

Please also consider that we are not rate limiting all traffic. Only 401 Unauthorized responses. By doing this we should mitigate any unintentional abuse (For example spoofing a target to get them banned/rate limited).

Also a whole organization/company may sit behind a single firewall, so defaults are probably not reasonable

Understood. What do you think better defaults would be? Remember this code only runs when a 401 response is sent. Do we believe there would be more than 100 unauthorized responses in a 60 second period of time. That would be a 401 unauthorized responses every .6 seconds, sustained for a full 60 seconds straight.

It should also be noted that we're currently working on an allowlist that would exempt IP addresses from this rate limit. However in my opinion this should be broken up into a second patch (which I will submit when completed).

Last but not least option names are far too long, should be shortened and the request number limit (0 | > 0) should be used to decide, whether to apply a limit or not (the intended option --no-ratelimit-401 is redundant) - not, i.e. req-limit=0 sounds reasonable to me.

This makes sense to me. I see no problem making these changes! As far as shortened, what do you think of: --401-window and --401-req-limit?

Finally for now I think, using tools like fail2ban might be the better option to apply rate limits if one is convinced, that this changes anything or helps somehow.

A few things here:

a) fail2ban is not really designed to mitigate (D)DoS style attacks, as it periodically reads the logs b) In an attack situation where the logs are filling up (as detailed in the tickets linked above) fail2ban will cause more stress to the system as it process 10s of 1000s of additional lines of logs c) Using fail2ban would ban ALL traffic from a source IP address. This could be used to DoS attack a victim by banning them from the coturn server. This patch only rate limits responding with 401 Unauthorized responses, not all traffic. d) It would be best practice for coturn to deny this tool for use by DDoS for hire groups/tools/etc. As we've seen with NTPd, these types of abuses are only taken away from malicious actors (ie: DDoS for hire) if the software shuts these abused features down by default

In closing I will try to address your code review as soon as possible. Thank you for taking the time to properly review the code!

n3storm avatar May 31 '25 12:05 n3storm

Sorry if I am being absolutely naive: would it help using a non standard port and in an obscure subdomain?

stun.mydomain.com:3478 vs notudp.mydomain.com:4546

Using a non-standard will not change the fact that you can still use the server as part of a larger reflection based attack. Many people run port scans in the year 2025, so this type of "security through obscurity" is more likely to delay abuse rather than stop.

What about Crowdsec to prevent abuse?

What are the patterns that should be looked for at syslog?

Third-party tools like fail2ban and Crowdsec may help fight this type of abuse, but again are not the solution.

The problem is as such: The default setup of coturn allows abuse, thus people are abusing it. Any fix to stop this must be merged to coturn as a default setting to stop people from abusing it large. Using 3rd party tools to mitigate the failure of coturn to address this will not stop the widespread abuse of this issue. It may however stop your specific instance from being part of the abuse.

Like any software that can be abused, if it ships with the settings enabling abuse: Nobody is going to change it. Secure/sane settings must be shipped as default.

@n3storm If you are looking to using a 3rd party tool instead of this patch, look for this message in the logs: https://github.com/coturn/coturn/blob/f6004a1c185d666ebef024e54501266bfbd333ed/src/server/ns_turn_server.c#L3475

Consider blocking an address produces a high volume of those responses for a period of time.

Alternatively, you can just apply this patch :shrug:

e-lisa avatar Jun 01 '25 19:06 e-lisa

Seems i lost a bit the path now..in which branche are the mentioned patches now included? I tried to get the actual master branch, compiled it,replaced my existing files from the ubuntu apt package and tried to include the additional parameters in the config file but got a "bad format" message in the logs,


0: (24581): WARNING: Bad configuration format: ratelimit-401-requests-per-window
0: (24581): WARNING: Bad configuration format: ratelimit-401-window-seconds

Same with the "ratelimit-401" branch... or is the parameter made for being put in the exec line of the systemd startupscript? I tried this also but received an info about unknown / configured parameters in bad configuration format also.. I putted this as an additional parameter block into the config file:

ratelimit-401-requests-per-window=10
ratelimit-401-window-seconds=5
no-rfc5780
no-stun-backward-compatibility
response-origin-only-with-rfc5780
no-software-attribute

and tried this in the systemd startupscript:

ExecStart=/usr/bin/turnserver --ratelimit-401-requests-per-window=10 --ratelimit-401-window-seconds=5 --daemon -c /etc/turnserver.conf --pidfile /run/turnserver/turnserver.pid

None of both tries worked, so how do i get this working for now until its fixed in the apt-package? Thanks in advance, Tom Edit: Seems i found the caveeat in my config: slightly confused about this thread i took the wrong parameter names as different names were mentioned in this thread....for the ratelimi-401 branch, proper ones were:

401-req-limit=10
401-window=5
no-rfc5780
no-stun-backward-compatibility
response-origin-only-with-rfc5780
no-software-attribute

Now it seems to work when i look to syslog:

Jun  3 16:18:22 myhost turnserver: 0: (15438): INFO: Setting 401 ratelimit requests per window to: 10
Jun  3 16:18:22 myhost turnserver: 0: (15438): INFO: Setting 401 ratelimit window to: 5 seconds

TomcatMJ avatar Jun 02 '25 13:06 TomcatMJ

So, any chances at getting this merged, or is there another approach to solving the "this RFC was written with a hole in it allowing for abuse" problem?

julialongtin avatar Jun 10 '25 13:06 julialongtin

Sidenote: We've observed some interesting interactions between this patch and nextcloud-talk-recording. Our req-limit is set to 20, and without setting 401-allowlist for the recording container it would fail if more than one or two calls were placed within the time set by 401-window. Our current working theory is that talk-recording tries to authenticate in several different ways that are unsupported by coturn, which results in 401's being generated, and finally getting limited.

This behavior should be documented somewhere, as it's extremely hard to debug (very little logs, the application itself not giving helpful output, etc).

Also, IMO 401-allowlist should be in a similar format to denied-peer-ip / allowed-peer-ip: IP ranges specified directly in the .conf file, not a separate file with IPs one by one. (I started working on refactoring this, but got confused by some abstractions around the IP lists, with hardcoded allowed/denied values.)

sdomi avatar Aug 19 '25 09:08 sdomi

@eakraly @ggarber this PR looks like mergeable and the functionality is already tested by @mastaab 's Wireapp and some other people. Could you please merge the PR before the next attack wave?

wilkis3 avatar Sep 01 '25 10:09 wilkis3

@e-lisa can you please add prometheus metrics? I am thinking of 2:

  • Packets rate limited (counter)
  • Size of rate-limit map (IPs rate limited)

eakraly avatar Sep 07 '25 01:09 eakraly

Happy Anniversary! This critical security PR is now over a year old, and has been in the field, for over a year, defending coturn's more adventurous users. Could we merge this soon, please?

julialongtin avatar Nov 03 '25 17:11 julialongtin

@e-lisa can you please add prometheus metrics? I am thinking of 2:

* Packets rate limited (counter)

* Size of rate-limit map (IPs rate limited)

I can see if I can't fit this in this weekend, should be very useful.

e-lisa avatar Nov 04 '25 22:11 e-lisa

Could this attack also occur with STUN messages (Binding Requests) that don’t require authentication and therefore don’t trigger 401 responses?

I honestly have not looked at this. In the wild all I've seen is reports of using 401 responses for amplification. Someone should look into this.

CC @ggarber

e-lisa avatar Nov 05 '25 00:11 e-lisa

Seems i lost a bit the path now..in which branche are the mentioned patches now included? I tried to get the actual master branch, compiled it,replaced my existing files from the ubuntu apt package and tried to include the additional parameters in the config file but got a "bad format" message in the logs,

FYI, this branch is now updated to the latest master

e-lisa avatar Nov 05 '25 00:11 e-lisa