tide Add body length limiting middleware

I was browsing the warp docs today and spotted the content_length filter. This allows setting limits on body payloads, which can be a DDOS vector. This would be neat to add to Tide as middleware.

Example

I was thinking this could look something like this:

use tide::limit;

let mut app = tide::new();
app.at("/json").middleware(limit::content_len(1024 * 2)).post(|_| async {
    Ok("req body len was less than 2kb")
});

Additionally a hierarchy of limits could be applied. For example we could say that for the whole app no body should ever be over 100mb. But then for specific routes we can set even lower limits. This seems like a practical solution for something that often tends to be an afterthought.

Apr 21 '20 12:04 yoshuawuyts

Hello! I would like to take up this issue. But I wanted to know what should be done with requests that cross the limit? Return error to the client and close the connection?

Apr 29 '20 09:04 govardhangdg

Exceeding the limit could be treated as a "hack" thus closing the connection is the way to go.

Apr 29 '20 09:04 xpepermint

Thanks. I have been experimenting with a few things. (I am new to async-std and Tide, so I may be missing obvious ways to achieve the task. Please feel free to correct me where the code can be improved ) :)

#[derive(Debug)]
struct PostLimiter(usize);

impl PostLimiter {
    fn new(max_limit: usize) -> Self { 
        Self(max_limit)
    }

    async fn check_for_size<'a, State: Send + Sync + 'static>(&self, cx: Request<State>, next: Next<'a, State>) -> Result {
        if cx.len().unwrap() > self.0 {
            Err(tide::Error::from_str(StatusCode::Forbidden, "too much"))?;
        }
        let res = next.run(cx).await?;
        Ok(res)
    }
}

impl <State: Send + Sync + 'static> Middleware<State> for PostLimiter {
    fn handle<'a>( &'a self, cx: Request<State>, next: Next<'a, State> ) -> Pin<Box<dyn Future<Output = Result> + Send + 'a>> {
        Box::pin(self.check_for_size(cx, next))
    }
}

But, here I observed that the len may not be set. So, I tried to do something like this:

 async fn check_for_size<'a, State: Send + Sync + 'static>(&self, mut cx: Request<State>, next: Next<'a, State>) -> Result {
        let s: &mut [u8] = &mut vec![0; self.0 + 1];
        match cx.read_exact(s).await {
            Err(err) if err.kind() == ErrorKind::UnexpectedEof => {
                let res = next.run(cx).await?;
                Ok(res)
            },
            _ => Err(tide::Error::from_str(StatusCode::Forbidden, "too much"))?,
        }
    }

Then I realized that the body of the Request is a stream and once read by the middleware, it's kinda lost permanently. So, this is where I am currently. I have also got a couple of doubts.

In the issue description:

Additionally a hierarchy of limits could be applied. For example, we could say that for the whole app nobody should ever be over 100mb. But then for specific routes, we can set even lower limits.

I was thinking of ways to implement this and I came across the idea of local state (here). So I was thinking if it's possible to maintain the lowest limit seen so far as local state. So, can I modify the request's local state inside the middleware? 2. And then, is it possible for a client to keep sending the data (through post request), even after the middleware is run? So basically, are the middleware functions run as soon as the request is received (even before the complete body is received)?

Thanks.

Apr 29 '20 11:04 govardhangdg

Well, for this plugin to be useful we must not rely on headers (Content-Length). A hacker can send invalid headers thus you have to summarize the length of each received chunk and close the connection when the rule is violated.

Apr 29 '20 11:04 xpepermint

Okay, thanks. Is there a way to find the length of the chunks without actually reading them? I kinda did this, in my second version, but i consumed (i hope this makes sense) the data from the body. And then wasn't able to put it back for the next middleware or endpoint to use.

Apr 29 '20 11:04 govardhangdg

While returning something like status 416 may technically be more correct, in case of attack mitigation returning error 400 is probably the most sensible. 403 doesn't seem correct.

As for the stream question -- now I'm not quite sure if this is possible atm -- but I'm not sure that running the middleware is sensible if there is no body consumption in the app to begin with.

I'm a little unfamiliar with tokio(?) streams but I think ideally the middleware would have sort of a "pass through" stream between the request body stream and any other client code which wants to consume the body. That way it is only even counted if the client tries to consume the data.

(Making an assumption that the rest of the stack doesn't do substantial preprocessing / buffering of body data, that is.)

Apr 29 '20 18:04 Fishrock123

I realized that the way to go here is to use stream.read(), read the body of size set in Content-Length header an then set connection timeout. It appears that the body stream reader will simply wait when the body size is bigger.

May 05 '20 10:05 xpepermint

Thanks @xpepermint . I will look into it. I wasn't sure how to proceed. :)

May 06 '20 01:05 govardhangdg

As an extra comment, would it be worth extending this to other areas like headers to limit the size of headers that are received too?

Another design consideration is that it may not just be per-route limits, but per-route-per-user. There may be an authenticated user via an auth-bearer token that could have a higher limit on post than an unauthenticated user (IE an admin needs to upload a backup to be restored to the server). So I think there would need to be a way to get the limit per-request.

Thanks!

Aug 31 '20 05:08 Firstyear

Any progress on this? This would be nice to have for a production server

Dec 12 '20 18:12 danieleades

@danieleades it seems this hasn't seen process in a while. I definitely agree this would be useful, and would like to see this picked up. Probably the best way to approach this would be to author a middleware in a separate crate so we can try it out, and if we like it we can merge that approach into Tide proper.

@danieleades would this be something you'd like to try authoring?

Dec 18 '20 14:12 yoshuawuyts

@danieleades it seems this hasn't seen process in a while. I definitely agree this would be useful, and would like to see this picked up. Probably the best way to approach this would be to author a middleware in a separate crate so we can try it out, and if we like it we can merge that approach into Tide proper.

@danieleades would this be something you'd like to try authoring?

That's some black belt level delegation right there. You son of a bitch, I'm in.

What's the behaviour we're looking for here?

check the header and return an error if it's greater than the configured maximum
sniff the stream and return an error if the size doesn't match the header

(Do there exist any valid use cases where the body size doesn't match the header?)

Dec 18 '20 15:12 danieleades

Hah, glad you're keen!

Correctness concerns should be handled by the transport protocol; in this case that's the async-h1 crate. What I was thinking for this middleware would indeed be the first behavior: count the bytes in the stream as they pass through, and return an error when the threshold is reached. Probably a 413: Payload Too Large error.

There's going to be a bit of trickery involved to make this work in a streaming fashion though. The middleware will need to wrap the incoming body with a counter, and then error when the threshold is reached. It will however not be able to also set the status code because AsyncRead returns io::Result rather than http_types::Result which can contain a status code.

I wonder if the right solution would be for the stream wrapper which sets an AtomicBool signaling whether the limit has been reached, and if that's the case the middleware can rewrite the error code to 413? It's not particularly elegant though; I'd be thrilled if we could find a good solution for this!

Dec 18 '20 23:12 yoshuawuyts

@yoshuawuyts it's not the cleanest is it? I guess the advantage that Warp has here is that at this point you're filtering a TryStream rather than an AsyncRead and have a little more control over the error.

How does this work in this ecosystem? Is the byte stream buffered somewhere to give you an AsyncWrite interface?

I'm hoping to take a look at this sometime this weekend

Dec 19 '20 11:12 danieleades

@yoshuawuyts here's a sketch - https://github.com/danieleades/tide-upload-limit/pull/3

I wonder if the right solution would be for the stream wrapper which sets an AtomicBool signaling whether the limit has been reached, and if that's the case the middleware can rewrite the error code to 413? It's not particularly elegant though; I'd be thrilled if we could find a good solution for this!

yeah. this part isn't very pretty.

Dec 20 '20 20:12 danieleades

I originally implemented a 'sniffer' that could also check that the content length of the body matches the content-length header, and returns an error if it's not. It's a trivial extension to the current 'sniffer', but possibly a separate concern.

Dec 20 '20 20:12 danieleades

@danieleades that's really cool work! -- thanks for doing that! Could you perhaps file a PR on the Tide README to link to your project? I think people will find it useful to reference!

Implementation wise I guess there really wasn't any escaping the atomic bool approach was there? I don't think there's much left to innovate on this middleware. If you like I'd also be open to accepting a PR on tide to add this. Possibly under the limit sub-module name.

It's a trivial extension to the current 'sniffer', but possibly a separate concern.

Yeah, this is a correctness concern at the h1 layer; an opt-in middleware would not be the right place to enforce this.

Dec 24 '20 13:12 yoshuawuyts

@yoshuawuyts will hopefully look at this in the next few days.

just to get some clarification on the division of labour- I think there are 3 separate but related concerns here

reject the request if the header indicates the body is too big
reject the request if the body is larger than the maximum
reject the request if the body is smaller than the header indicates

cases 2 and 3 can only be reached if the header is incorrect.
my middleware is currently enforcing 1 and 2
you indicate the correctness (ie. the header matches the body) is a concern of h1- ie cases 2 and 3. That would let me remove 90% of my library... also for h1 to enforce this kind of correctness it would have to 'sniff' the body in the same way my crate does, since the body is read lazily. Is that the plan?

Dec 24 '20 22:12 danieleades

It's hard to see a way to unlike "correctness" at the protocol level and the "practical" elements of networking in this case. We have a part of the protocol indicating the behaviour of the "one the wire" bytestream, and that allows a mismatch to occur. To check and enforce this requires both elements to be inspected.

So if h1 were to enforce this, then it would have to effectively support 2 and 3 as you say, but the moment you support 3, you have to count bytes, meaning it is then trivial to support case 1.

but in the terms of this feature, I think that cases 1 and 2 are what matter - if the request was shorter than indicated that means that the clients communication was interrupted, or something else has occurred, and "something else" will go wrong.

But most "malicious" entities want to abuse large requests rather than sending mismatched small ones, so it's more important to focus on 1 and 2 IMO.

At the end of the day, it's not the "header" that we should focus on, because this is just a "size hint". It will always come to counting bytes that we received.

Hope that helps,

Dec 24 '20 23:12 Firstyear

i'm struggling to write an integration test which checks that this body length limiting middleware is working properly in the case that the reported length of the body is less than the actual body length.

The reason i'm struggling is that when i call read_to_end on the tide::Request it only ever returns the number of bytes that was set when the tide::http::Body was created.

That is, it is already limited by the implementation.

note that i'm using the 'lenproperty of theBodyrather than the 'content-length' header, on the assumption thath1` or similar will have correctly set this property on an async body based on this header (does this assumption hold?)

this makes it hard to recognise that the limit has been reached. Right now i'm trying to catch it by intercepting calls to poll_read on the Request, but my interception is never firing because the underlying reader seems to be doing the same thing already.

Dec 27 '20 07:12 danieleades

here we go, this is from http-types -

impl AsyncRead for Body {
    #[allow(missing_doc_code_examples)]
    fn poll_read(
        mut self: Pin<&mut Self>,
        cx: &mut Context<'_>,
        buf: &mut [u8],
    ) -> Poll<io::Result<usize>> {
        let mut buf = match self.length {
            None => buf,
            Some(length) if length == self.bytes_read => return Poll::Ready(Ok(0)),
            Some(length) => {
                let max_len = (length - self.bytes_read).min(buf.len());
                &mut buf[0..max_len]
            }
        };

        let bytes = ready!(Pin::new(&mut self.reader).poll_read(cx, &mut buf))?;
        self.bytes_read += bytes;
        Poll::Ready(Ok(bytes))
    }
}

this is already doing body length limiting (in the case that the actual body length is greater than the 'reported' length). This means my check never fires, and I never get an opportunity to set the response code. It also means most of my implementation is redundant!

thoughts?

Dec 27 '20 07:12 danieleades

I think this middleware should indeed only enforce a maximum limit on the body stream and not worry about the lower limit since that's already handled.

Dec 30 '20 22:12 yoshuawuyts

@danieleades since http-types is already checking this (partially) would it be worth extending this to allow configuration of the maximum allowed request size, and this middleware just exposes that functionality? Or am I misunderstanding how this goes together?

Something also to keep in mind could be the desire for per route size limits ...

Thanks!

Jan 03 '21 23:01 Firstyear

sorry, perhaps my explanation was a little unclear. I've created a test case where I create a request from a string, boxed as a dyn AsyncRead. I set a maximum body size in the middleware which is less than the length of the string, and set the size of the payload "maliciously" to less than the maximum.

I would expect that since the body size exceeds the maximum, my middleware would fire and return an error. In actual fact what happens is the body only ever returns the number of bytes set as the 'length' and no more, which means my middleware never fires.

Perhaps there's an issue with the way i'm setting this up

use futures_util::io::AsyncReadExt;
use tide::{Request, Server};
use tide_upload_limit::UploadLimit;

/// This tests that the behaviour is correct when the content-length header is
/// incorrectly (or maliciously) set.

#[async_std::test]
async fn payload_over_limit() {
    let payload = "this string is 23 bytes";
    let upload_limit = 10;
    let header_length = 9;

    let response = get_response(payload, upload_limit, header_length).await;

    assert_eq!(response.status(), tide::StatusCode::PayloadTooLarge);
}

async fn get_response(
    payload: &'static str,
    upload_limit: usize,
    header_size: usize,
) -> tide::Response {
    let mut app = app();

    // set a global upload limit
    app.with(UploadLimit::new(upload_limit));

    let request = request(payload, Some(header_size));

    // get response
    app.respond(request).await.unwrap()
}

pub fn app() -> Server<()> {
    let mut app = tide::new();

    app.at("/").post(handle);

    app
}

async fn handle(mut request: Request<()>) -> Result<String, tide::Error> {
    let mut buf = Vec::new();

    request.read_to_end(&mut buf).await?;

    let s = String::from_utf8(buf)?;

    println!("read string: {}", s);

    Ok(s)
}

pub fn request(payload: &'static str, payload_length: Option<usize>) -> Request<()> {
    let length = payload_length.or_else(|| Some(payload.len()));
    let reader = futures_util::io::BufReader::new(payload.as_bytes());

    let body = tide::http::Body::from_reader(reader, length);

    let mut request = tide::http::Request::new(
        tide::http::Method::Post,
        tide::http::Url::parse("http://example.com").unwrap(),
    );

    request.set_body(body);

    request.into()
}

the output of this request handler is the first 9 bytes of the input string.

Jan 09 '21 08:01 danieleades

Could this be that h1 or other http layers lower in the stack are actually blocking the request once you hit the limit set in length? If you do the same test but dont send the length header does your middleware fire?

Jan 10 '21 22:01 Firstyear

Could this be that h1 or other http layers lower in the stack are actually blocking the request once you hit the limit set in length? If you do the same test but dont send the length header does your middleware fire?

exactly.

I've added integrations tests here for the three cases

size correctly set
size "maliciously" set
size not set

the middleware fires correctly when the size is correctly set, and when the size isn't set at all. It fails if the size is set to a lower number (and that lower number is the maximum number of bytes that are ever returned)

the assumption is that layers lower in the stack are doing rate limiting already. If we also take the above comment from @yoshuawuyts - "Correctness concerns should be handled by the transport protocol;" then this middleware should not concern itself with the case that the body length isn't set (this should be enforced lower in the stack).

which only leaves checking that the header is below a configured threshold. ie, the byte-sniffing is redundant.

so now what? Some of these lower layers might need a slight rethink.

I would also add that whether the middleware is global or per route is really a question for the architecture of tide itself, rather than any single middleware. should the hook be available for applying this middle way on a single route, it would work exactly the same way.

Jan 11 '21 19:01 danieleades

please jump in if my interpretation here is amiss! I'm making a lot of assumptions here about how the lower layers behave with very little knowledge of their actual behaviour...

Jan 11 '21 19:01 danieleades

Well, if the lower layers are already doing this work, there seems to be little sense in double accounting - it seems like the lower levels should be configurable to do the limiting and accounting and this middle ware then wouldn't be needed?

And of course, I still think that per-route makes sense - for example, you have a rest endpoint you may not want large data being posted to that filling buffers so you would want a small limit, but for a file upload you want a larger limit etc.

But I think it would be up to @yoshuawuyts or someone more involved in tide/async-h1 to really make these calls :)

Jan 12 '21 01:01 Firstyear

Seems that Warp's approach is to not handle this at the lower layers in favour of handling it explicitly at the top layer.

Jan 12 '21 07:01 danieleades

Seems that Warp's approach is to not handle this at the lower layers in favour of handling it explicitly at the top layer.

In a way I think this may be a good idea because then it does give a lot of flexibility for per-route/app limits, but again, I think this is a decision that is for the actual tide developers :)

Jan 12 '21 23:01 Firstyear

tide tide copied to clipboard

Add body length limiting middleware

Example

tide
tide copied to clipboard