tide
tide copied to clipboard
Add body length limiting middleware
I was browsing the warp docs today and spotted the content_length filter. This allows setting limits on body payloads, which can be a DDOS vector. This would be neat to add to Tide as middleware.
Example
I was thinking this could look something like this:
use tide::limit;
let mut app = tide::new();
app.at("/json").middleware(limit::content_len(1024 * 2)).post(|_| async {
Ok("req body len was less than 2kb")
});
Additionally a hierarchy of limits could be applied. For example we could say that for the whole app no body should ever be over 100mb. But then for specific routes we can set even lower limits. This seems like a practical solution for something that often tends to be an afterthought.
Hello! I would like to take up this issue. But I wanted to know what should be done with requests that cross the limit? Return error to the client and close the connection?
Exceeding the limit could be treated as a "hack" thus closing the connection is the way to go.
Thanks. I have been experimenting with a few things. (I am new to async-std and Tide, so I may be missing obvious ways to achieve the task. Please feel free to correct me where the code can be improved ) :)
#[derive(Debug)]
struct PostLimiter(usize);
impl PostLimiter {
fn new(max_limit: usize) -> Self {
Self(max_limit)
}
async fn check_for_size<'a, State: Send + Sync + 'static>(&self, cx: Request<State>, next: Next<'a, State>) -> Result {
if cx.len().unwrap() > self.0 {
Err(tide::Error::from_str(StatusCode::Forbidden, "too much"))?;
}
let res = next.run(cx).await?;
Ok(res)
}
}
impl <State: Send + Sync + 'static> Middleware<State> for PostLimiter {
fn handle<'a>( &'a self, cx: Request<State>, next: Next<'a, State> ) -> Pin<Box<dyn Future<Output = Result> + Send + 'a>> {
Box::pin(self.check_for_size(cx, next))
}
}
But, here I observed that the len may not be set. So, I tried to do something like this:
async fn check_for_size<'a, State: Send + Sync + 'static>(&self, mut cx: Request<State>, next: Next<'a, State>) -> Result {
let s: &mut [u8] = &mut vec![0; self.0 + 1];
match cx.read_exact(s).await {
Err(err) if err.kind() == ErrorKind::UnexpectedEof => {
let res = next.run(cx).await?;
Ok(res)
},
_ => Err(tide::Error::from_str(StatusCode::Forbidden, "too much"))?,
}
}
Then I realized that the body of the Request is a stream and once read by the middleware, it's kinda lost permanently. So, this is where I am currently.
I have also got a couple of doubts.
- In the issue description:
Additionally a hierarchy of limits could be applied. For example, we could say that for the whole app nobody should ever be over 100mb. But then for specific routes, we can set even lower limits.
I was thinking of ways to implement this and I came across the idea of local state (here). So I was thinking if it's possible to maintain the lowest limit seen so far as local state. So, can I modify the request's local state inside the middleware?
2. And then, is it possible for a client to keep sending the data (through post request), even after the middleware is run? So basically, are the middleware functions run as soon as the request is received (even before the complete body is received)?
Thanks.
Well, for this plugin to be useful we must not rely on headers (Content-Length). A hacker can send invalid headers thus you have to summarize the length of each received chunk and close the connection when the rule is violated.
Okay, thanks. Is there a way to find the length of the chunks without actually reading them? I kinda did this, in my second version, but i consumed (i hope this makes sense) the data from the body. And then wasn't able to put it back for the next middleware or endpoint to use.
While returning something like status 416 may technically be more correct, in case of attack mitigation returning error 400 is probably the most sensible. 403 doesn't seem correct.
As for the stream question -- now I'm not quite sure if this is possible atm -- but I'm not sure that running the middleware is sensible if there is no body consumption in the app to begin with.
I'm a little unfamiliar with tokio(?) streams but I think ideally the middleware would have sort of a "pass through" stream between the request body stream and any other client code which wants to consume the body. That way it is only even counted if the client tries to consume the data.
(Making an assumption that the rest of the stack doesn't do substantial preprocessing / buffering of body data, that is.)
I realized that the way to go here is to use stream.read(), read the body of size set in Content-Length header an then set connection timeout. It appears that the body stream reader will simply wait when the body size is bigger.
Thanks @xpepermint . I will look into it. I wasn't sure how to proceed. :)
As an extra comment, would it be worth extending this to other areas like headers to limit the size of headers that are received too?
Another design consideration is that it may not just be per-route limits, but per-route-per-user. There may be an authenticated user via an auth-bearer token that could have a higher limit on post than an unauthenticated user (IE an admin needs to upload a backup to be restored to the server). So I think there would need to be a way to get the limit per-request.
Thanks!
Any progress on this? This would be nice to have for a production server
@danieleades it seems this hasn't seen process in a while. I definitely agree this would be useful, and would like to see this picked up. Probably the best way to approach this would be to author a middleware in a separate crate so we can try it out, and if we like it we can merge that approach into Tide proper.
@danieleades would this be something you'd like to try authoring?
@danieleades it seems this hasn't seen process in a while. I definitely agree this would be useful, and would like to see this picked up. Probably the best way to approach this would be to author a middleware in a separate crate so we can try it out, and if we like it we can merge that approach into Tide proper.
@danieleades would this be something you'd like to try authoring?
That's some black belt level delegation right there. You son of a bitch, I'm in.
What's the behaviour we're looking for here?
- check the header and return an error if it's greater than the configured maximum
- sniff the stream and return an error if the size doesn't match the header
(Do there exist any valid use cases where the body size doesn't match the header?)
Hah, glad you're keen!
Correctness concerns should be handled by the transport protocol; in this case that's the async-h1 crate. What I was thinking for this middleware would indeed be the first behavior: count the bytes in the stream as they pass through, and return an error when the threshold is reached. Probably a 413: Payload Too Large error.
There's going to be a bit of trickery involved to make this work in a streaming fashion though. The middleware will need to wrap the incoming body with a counter, and then error when the threshold is reached. It will however not be able to also set the status code because AsyncRead returns io::Result rather than http_types::Result which can contain a status code.
I wonder if the right solution would be for the stream wrapper which sets an AtomicBool signaling whether the limit has been reached, and if that's the case the middleware can rewrite the error code to 413? It's not particularly elegant though; I'd be thrilled if we could find a good solution for this!
@yoshuawuyts it's not the cleanest is it? I guess the advantage that Warp has here is that at this point you're filtering a TryStream rather than an AsyncRead and have a little more control over the error.
How does this work in this ecosystem? Is the byte stream buffered somewhere to give you an AsyncWrite interface?
I'm hoping to take a look at this sometime this weekend
@yoshuawuyts here's a sketch - https://github.com/danieleades/tide-upload-limit/pull/3
I wonder if the right solution would be for the stream wrapper which sets an AtomicBool signaling whether the limit has been reached, and if that's the case the middleware can rewrite the error code to 413? It's not particularly elegant though; I'd be thrilled if we could find a good solution for this!
yeah. this part isn't very pretty.
I originally implemented a 'sniffer' that could also check that the content length of the body matches the content-length header, and returns an error if it's not. It's a trivial extension to the current 'sniffer', but possibly a separate concern.
@danieleades that's really cool work! -- thanks for doing that! Could you perhaps file a PR on the Tide README to link to your project? I think people will find it useful to reference!
Implementation wise I guess there really wasn't any escaping the atomic bool approach was there? I don't think there's much left to innovate on this middleware. If you like I'd also be open to accepting a PR on tide to add this. Possibly under the limit sub-module name.
It's a trivial extension to the current 'sniffer', but possibly a separate concern.
Yeah, this is a correctness concern at the h1 layer; an opt-in middleware would not be the right place to enforce this.
@yoshuawuyts will hopefully look at this in the next few days.
just to get some clarification on the division of labour- I think there are 3 separate but related concerns here
- reject the request if the header indicates the body is too big
- reject the request if the body is larger than the maximum
- reject the request if the body is smaller than the header indicates
- cases 2 and 3 can only be reached if the header is incorrect.
- my middleware is currently enforcing 1 and 2
- you indicate the correctness (ie. the header matches the body) is a concern of
h1- ie cases 2 and 3. That would let me remove 90% of my library... also forh1to enforce this kind of correctness it would have to 'sniff' the body in the same way my crate does, since the body is read lazily. Is that the plan?
It's hard to see a way to unlike "correctness" at the protocol level and the "practical" elements of networking in this case. We have a part of the protocol indicating the behaviour of the "one the wire" bytestream, and that allows a mismatch to occur. To check and enforce this requires both elements to be inspected.
So if h1 were to enforce this, then it would have to effectively support 2 and 3 as you say, but the moment you support 3, you have to count bytes, meaning it is then trivial to support case 1.
but in the terms of this feature, I think that cases 1 and 2 are what matter - if the request was shorter than indicated that means that the clients communication was interrupted, or something else has occurred, and "something else" will go wrong.
But most "malicious" entities want to abuse large requests rather than sending mismatched small ones, so it's more important to focus on 1 and 2 IMO.
At the end of the day, it's not the "header" that we should focus on, because this is just a "size hint". It will always come to counting bytes that we received.
Hope that helps,
i'm struggling to write an integration test which checks that this body length limiting middleware is working properly in the case that the reported length of the body is less than the actual body length.
The reason i'm struggling is that when i call read_to_end on the tide::Request it only ever returns the number of bytes that was set when the tide::http::Body was created.
That is, it is already limited by the implementation.
note that i'm using the 'lenproperty of theBodyrather than the 'content-length' header, on the assumption thath1` or similar will have correctly set this property on an async body based on this header (does this assumption hold?)
this makes it hard to recognise that the limit has been reached. Right now i'm trying to catch it by intercepting calls to poll_read on the Request, but my interception is never firing because the underlying reader seems to be doing the same thing already.
here we go, this is from http-types -
impl AsyncRead for Body {
#[allow(missing_doc_code_examples)]
fn poll_read(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut [u8],
) -> Poll<io::Result<usize>> {
let mut buf = match self.length {
None => buf,
Some(length) if length == self.bytes_read => return Poll::Ready(Ok(0)),
Some(length) => {
let max_len = (length - self.bytes_read).min(buf.len());
&mut buf[0..max_len]
}
};
let bytes = ready!(Pin::new(&mut self.reader).poll_read(cx, &mut buf))?;
self.bytes_read += bytes;
Poll::Ready(Ok(bytes))
}
}
this is already doing body length limiting (in the case that the actual body length is greater than the 'reported' length). This means my check never fires, and I never get an opportunity to set the response code. It also means most of my implementation is redundant!
thoughts?
I think this middleware should indeed only enforce a maximum limit on the body stream and not worry about the lower limit since that's already handled.
@danieleades since http-types is already checking this (partially) would it be worth extending this to allow configuration of the maximum allowed request size, and this middleware just exposes that functionality? Or am I misunderstanding how this goes together?
Something also to keep in mind could be the desire for per route size limits ...
Thanks!
sorry, perhaps my explanation was a little unclear.
I've created a test case where I create a request from a string, boxed as a dyn AsyncRead. I set a maximum body size in the middleware which is less than the length of the string, and set the size of the payload "maliciously" to less than the maximum.
I would expect that since the body size exceeds the maximum, my middleware would fire and return an error. In actual fact what happens is the body only ever returns the number of bytes set as the 'length' and no more, which means my middleware never fires.
Perhaps there's an issue with the way i'm setting this up
use futures_util::io::AsyncReadExt;
use tide::{Request, Server};
use tide_upload_limit::UploadLimit;
/// This tests that the behaviour is correct when the content-length header is
/// incorrectly (or maliciously) set.
#[async_std::test]
async fn payload_over_limit() {
let payload = "this string is 23 bytes";
let upload_limit = 10;
let header_length = 9;
let response = get_response(payload, upload_limit, header_length).await;
assert_eq!(response.status(), tide::StatusCode::PayloadTooLarge);
}
async fn get_response(
payload: &'static str,
upload_limit: usize,
header_size: usize,
) -> tide::Response {
let mut app = app();
// set a global upload limit
app.with(UploadLimit::new(upload_limit));
let request = request(payload, Some(header_size));
// get response
app.respond(request).await.unwrap()
}
pub fn app() -> Server<()> {
let mut app = tide::new();
app.at("/").post(handle);
app
}
async fn handle(mut request: Request<()>) -> Result<String, tide::Error> {
let mut buf = Vec::new();
request.read_to_end(&mut buf).await?;
let s = String::from_utf8(buf)?;
println!("read string: {}", s);
Ok(s)
}
pub fn request(payload: &'static str, payload_length: Option<usize>) -> Request<()> {
let length = payload_length.or_else(|| Some(payload.len()));
let reader = futures_util::io::BufReader::new(payload.as_bytes());
let body = tide::http::Body::from_reader(reader, length);
let mut request = tide::http::Request::new(
tide::http::Method::Post,
tide::http::Url::parse("http://example.com").unwrap(),
);
request.set_body(body);
request.into()
}
the output of this request handler is the first 9 bytes of the input string.
Could this be that h1 or other http layers lower in the stack are actually blocking the request once you hit the limit set in length? If you do the same test but dont send the length header does your middleware fire?
Could this be that h1 or other http layers lower in the stack are actually blocking the request once you hit the limit set in length? If you do the same test but dont send the length header does your middleware fire?
exactly.
I've added integrations tests here for the three cases
- size correctly set
- size "maliciously" set
- size not set
the middleware fires correctly when the size is correctly set, and when the size isn't set at all. It fails if the size is set to a lower number (and that lower number is the maximum number of bytes that are ever returned)
the assumption is that layers lower in the stack are doing rate limiting already. If we also take the above comment from @yoshuawuyts - "Correctness concerns should be handled by the transport protocol;" then this middleware should not concern itself with the case that the body length isn't set (this should be enforced lower in the stack).
which only leaves checking that the header is below a configured threshold. ie, the byte-sniffing is redundant.
so now what? Some of these lower layers might need a slight rethink.
I would also add that whether the middleware is global or per route is really a question for the architecture of tide itself, rather than any single middleware. should the hook be available for applying this middle way on a single route, it would work exactly the same way.
please jump in if my interpretation here is amiss! I'm making a lot of assumptions here about how the lower layers behave with very little knowledge of their actual behaviour...
Well, if the lower layers are already doing this work, there seems to be little sense in double accounting - it seems like the lower levels should be configurable to do the limiting and accounting and this middle ware then wouldn't be needed?
And of course, I still think that per-route makes sense - for example, you have a rest endpoint you may not want large data being posted to that filling buffers so you would want a small limit, but for a file upload you want a larger limit etc.
But I think it would be up to @yoshuawuyts or someone more involved in tide/async-h1 to really make these calls :)
Seems that Warp's approach is to not handle this at the lower layers in favour of handling it explicitly at the top layer.
Seems that Warp's approach is to not handle this at the lower layers in favour of handling it explicitly at the top layer.
In a way I think this may be a good idea because then it does give a lot of flexibility for per-route/app limits, but again, I think this is a decision that is for the actual tide developers :)