jetty.project icon indicating copy to clipboard operation
jetty.project copied to clipboard

DosHandler

Open gregw opened this issue 1 year ago • 20 comments
trafficstars

Fix #10478. This is a simple DosHandler that delays and then rejects all requests over a limit.

gregw avatar Jul 22 '24 06:07 gregw

@sbordet @lorban Can you give this a quick initial review before I commit time to tests, config and doco.

gregw avatar Jul 22 '24 06:07 gregw

Speaking about the design, I see two problems if you configure a moderately large throughput (say 10K req/s):

  • You need to allocate an array with one 64-bit entry per request per second. For 10K req/s that's ~80 KB of memory that's constantly scanned and updated. That alone will totally trash all CPU caches.
  • There's a single lock protecting that array. I'm not sure this could even reach 10K req/s.

lorban avatar Jul 23 '24 09:07 lorban

@lorban how about this version that uses an exponential moving average?

gregw avatar Jul 24 '24 00:07 gregw

@sbordet Can you take this one on?

gregw avatar Jul 26 '24 22:07 gregw

What I'd like:

  • Pluggable algorithm for rate exceeded -- this will reduce the number of parameters to the constructor, by separating the parameters for the algorithm from those just related to the DoSHandler like maxTrackers.
  • Now each tracker is a CyclicTimeout, but DosHandler should handle all the Trackers via CyclicTimeouts.
  • usePort seems weird, as using the ephemeral port from the client seems going against the purpose of the DoS defense: the client will use many ephemeral ports.
  • Not sure I understand the current algorithm: if a client sends 11 requests, and the 11th exceeds the rate, it is queued, but I'd say it's simpler to reject it immediately. Right now there is a hard-coded 2 s timeout that when expires rejects the queued request. Rejecting immediately would simplify (no queue, no queue parameters), and I see no point waiting 2 seconds to reject?

I would keep this Handler as simple as possible: if rate is exceeded, reject.

sbordet avatar Jul 29 '24 14:07 sbordet

@sbordet

What I'd like:

  • Pluggable algorithm for rate exceeded -- this will reduce the number of parameters to the constructor, by separating the parameters for the algorithm from those just related to the DoSHandler like maxTrackers.

OK, but I don't see replacing two algorithm arg (samplePeriodMs and alpha) with one new ExponentialMovingAverage(samplePeriodMs, alpha) as much of a saving in parameters. But I'm OK to have the algorithm pluggable.

But then perhaps we should do the same for the ID extraction, which is currently a protected method and two constructor arguments. I'll have a play and see...

  • Now each tracker is a CyclicTimeout, but DosHandler should handle all the Trackers via CyclicTimeouts.

OK.

  • usePort seems weird, as using the ephemeral port from the client seems going against the purpose of the DoS defense: the client will use many ephemeral ports.

It if for the use case when the server is behind some intermediary, so the remote IP is the intermediary and not the client. Sometimes you cannot trust the Forwarded-for headers because not all intermediaries are smart enough to police that they are not sent from the client itself. So using the source port on the intermediary is a proxy for identifying the connection and thus the client. This is (was?) commonly used by Cisco smart routers. But if we make the ID algorithm pluggable, then this can be done at no cost.

  • Not sure I understand the current algorithm: if a client sends 11 requests, and the 11th exceeds the rate, it is queued, but I'd say it's simpler to reject it immediately. Right now there is a hard-coded 2 s timeout that when expires rejects the queued request. Rejecting immediately would simplify (no queue, no queue parameters), and I see no point waiting 2 seconds to reject?

The idea of delay rather than rejecting is to delay additional requests on the same connection. An attacker can pipeline many HTTP/1 requests in a single TCP frame that is buffered. If you just reject the request, then the next one will be there and you will need to do work to reject that. Closing the connection can avoid that, but then that tells the attacker that they need to open another connection to continue the attack. By delaying, the attacker does not know if they are causing DOS or not, and they have to hold open resources to keep the connection alive.

I would keep this Handler as simple as possible: if rate is exceeded, reject.

Reject is not good enough. We'd have to close the connection to avoid issues of many pipelined h1 requests. But then we don't have that semantic for H2 and H3 connections, i.e. we can send a response to a single request that will close all other streams on the same h2 connection.

Delay is expensive for the server, so perhaps we should come up with a way of closing h2 and h3 connections?

gregw avatar Jul 31 '24 07:07 gregw

@sbordet I've pushed a change to make the ID a pluggable function rather than fixed. It is OK, but might a bit of effort to make it configurable from a module ini file.

I'll try the same approach for the rate algorithm...

gregw avatar Jul 31 '24 07:07 gregw

@gregw are you planning of integrating request handling with connection acceptance? I.e. stop accepting connections from the suspicious IP?

sbordet avatar Jul 31 '24 07:07 sbordet

@gregw are you planning of integrating request handling with connection acceptance? I.e. stop accepting connections from the suspicious IP?

I wasn't..... but that could be a good idea. Would need new semantic in our connectors, but we need to add some new semantic any if we are to be able to close a connection for h2/h3.

gregw avatar Jul 31 '24 08:07 gregw

@sbordet I've made the Rate pluggable now as well. It is all a bit messy and lacks javadoc, but give me some feedback on the direction before I commit any more effort. I might work a little bit tomorrow as well.

gregw avatar Jul 31 '24 08:07 gregw

@sbordet A wise programmer once said:

I would keep this Handler as simple as possible

By making this handler entirely pluggable, I think perhaps we are adding complexity where none is needed. Writing an entirely new custom handler is not that difficult and is better documented than writing three implementations of unknown interfaces to plug into this Handler. This pluggability will also make XML configuration really difficult.

I suggest we consider going back to a simple handler, with the algorithms in protected methods where possible, and keep it simple. If anybody wants something different, they can extend or replace. I'd prefer reject+block as the default algorithm, but that needs wider changes and less simplicity. So I think the delay+reject approach is fine for a simple DosHandler.

gregw avatar Jul 31 '24 20:07 gregw

@sbordet I've added in pluggable rejection. It is not too ugly. I'll try the xml configuration soon. If you have time for some more feedback/review, it would be appreciated before I commit too much to the XML

gregw avatar Aug 01 '24 00:08 gregw

@sbordet I've added XmlConfiguration and filled out the DosHandler a bit more. It's a little more complex than I'd like, but it is not too bad.

gregw avatar Aug 01 '24 07:08 gregw

@sbordet @lachlan-roberts @lorban nudge!!! 3 weeks is too long to wait for feedback!

gregw avatar Aug 23 '24 01:08 gregw

@sbordet nudge... no shove.... no POKE!!!!

gregw avatar Sep 16 '24 07:09 gregw

Let's discuss this face to face.

Count me in.

lorban avatar Sep 25 '24 15:09 lorban

@sbordet I've looked at having the rate tracker calculate when it will next go idle, but it involves multiple Math.log calculations, so I'm not sure it is worthwhile. I've delegated the default idleCheckPeriod to the rateTrackerFactory class instead. It can set a period based on the alpha, but currently I'm not doing that,, as it will probably need to know the max rate as well.

Why don't you take over this PR?

gregw avatar Sep 27 '24 06:09 gregw

@sbordet

I still have concerns about using exponential moving average (does not seem the right algorithm) for rate control, and about RateControl.isIdle() which I think unnecessary.

FYI, I described the handler to an AI and it proposed three possible algorithms:

  1. Tokens in a leaky bucket, which is space efficient, but requires a regular visit of every tracker to refill the bucket.
  2. Sliding window, which is very accurate, but is not space efficient for large request rates
  3. Exponential moving average, which is space and time efficient

In conclusion, it thought the EMA was the best choice.

HOWEVER. We are already visiting each tracker on a regular basis anyway too check for idleness, so perhaps the leaky bucket would be OK.

So at the very least, we should implement the leaky bucket algorithm as an alternative to test the API abstraction, or to see if it allows a simpler API.

gregw avatar Oct 08 '24 05:10 gregw

@lorban @sbordet I've pushed a new version that:

  • merges the Tracker and RateLimit concepts for simplicity
  • removes the lock and uses an AtomicBiInteger instead
  • replaces the EMA algorithm with a simpler leaky bucket (EMA with alpha == 0.0)

Thoughts before I fix up the documentation and modules?

gregw avatar Oct 09 '24 07:10 gregw

@sbordet @lachlan-roberts SHOVE!

gregw avatar Oct 14 '24 22:10 gregw

@lachlan-roberts nudge

gregw avatar Oct 21 '24 19:10 gregw