cowboy icon indicating copy to clipboard operation
cowboy copied to clipboard

Transparently handle gzipped request bodies

Open zyro opened this issue 8 years ago • 15 comments

I have a server that regularly receives relatively large requests, and it would be very helpful if Cowboy accepted compressed requests.

Perhaps look for a Content-Encoding: gzip request header, and call zlib:gunzip on the response body after it's been fully read from the client. Is this possible?

zyro avatar Feb 08 '16 19:02 zyro

Too dangerous because of zip bombs. Now that 18 has an interface to stream-unzip though, it probably isn't an impossible task anymore.

essen avatar Feb 08 '16 20:02 essen

I see how that can be a problem. Can you point me in the right direction for docs about the stream unzip you mentioned? I see zlib seems to handle streaming inflate/deflate but I can't find anything for gzip.

zyro avatar Feb 09 '16 11:02 zyro

Gzip is a header followed by deflated data, see https://tools.ietf.org/html/rfc1952

essen avatar Feb 09 '16 11:02 essen

Thanks for the pointers! I was able to get a decent solution rolling using streaming decompression and limiting both the buffer size and max chunks allowed. It's hooked in as a content_decode function passed through the opts in cowboy_req:body/2. This will probably work for me until more polished support is officially built in.

zyro avatar Feb 09 '16 18:02 zyro

Do post the code here or somewhere where it won't get lost. :-)

essen avatar Feb 09 '16 18:02 essen

Sorry for waking this thread up, but a Zipbomb is only dangerous if it has depth and the server doesn't have a limit? I am thinking if it affects us with requests holding gzipped json.

seivan avatar Mar 23 '17 23:03 seivan

The issue is not so much zip bombs but rather that compressed bodies can pack a very large uncompressed body in a small amount of compressed data, and so if the server uncompresses automatically then you will not need to make too many requests at the same time to make it run out of memory.

The defense against that is to limit the uncompressed amount you accept for each chunk, instead of limiting the amount you will attempt to uncompress. This was not possible in Erlang until 18.0 as the functions were not exposed. So it's something that may be added in the future, but is not a priority right now.

essen avatar Mar 23 '17 23:03 essen

I didn't quite understand, you're saying I can limit the amount uncompressed amount for each chunk compressed data? Does that mean you can unzip as you stream up to the server, check the size and then either allow or abort?

Is that what stream-unzip does, as you mentioned above?

seivan avatar Dec 03 '17 02:12 seivan

Check zlib:safeInflate/2.

essen avatar Dec 03 '17 07:12 essen

Hi,

I would like to use :zlib to support Content-Encoding: gzip and Content-Encoding: deflate. I have a working implementation in Plug but as described in that PR (https://github.com/elixir-plug/plug/pull/888) it probably makes more sense to have support for that in cowboy.

I could use some pointers to implement this :)

Reading through https://ninenines.eu/docs/en/cowboy/2.6/guide/req_body/, perhaps changing cowboy_req:read_body/1 to automatically inflate the body as it is being read might work?

Thanks!

gmalkas avatar Dec 14 '19 16:12 gmalkas

Where to implement this depends on the intent. If you need it everywhere then it makes sense to have it at a stream handler level. If you need it selectively depending on the resource, then I guess it would make sense out of read_body. But to be honest I'm not sure in the latter case that it's worth having inside Cowboy, since it's fairly simple to call zlib yourself and decompress everything at once or via the streaming interface.

essen avatar Dec 14 '19 20:12 essen

Hi @essen, thanks for the quick answer.

The intent would be to have it everywhere I guess, pretty much like cowboy can parse multipart requests out-of-the-box. Just like Content-Type can be used to automatically parse the content, Content-Encoding can be used to apply decompression before handing the content over to the application.

I can imagine this could cause problem for applications that rely on broken implementations of Content-Encoding so there might be a need for an option to disable the automated decompression in cowboy so that they can implement their own.

There would need to be an option to limit the length of uncompressed content instead of just the length of compressed content

I apologize if this does not make any sense, I'm not familiar with cowboy as I come from the Elixir world and I'm used to interfacing with cowboy through Plug, not cowboy directly. As you can see in the PR I linked, my intent was first to add it to Plug but @josevalim rightfully pointed out that this could be useful in cowboy.

I understand there might be some hesitation to add more automatic behavior to cowboy/plug as it's more risk for bugs, lower performance for people who don't rely on the behavior, etc.

Based on principles, is that a feature you'd be happy to have in cowboy or you think it's best left for end-users to implement?

I'm surprised this is not available already to be honest, it's not in nginx either. I would have thought compressed request bodies would be more common but it looks to be an edge-case.

Thank you.

gmalkas avatar Dec 14 '19 20:12 gmalkas

I mean that if you want all request bodies to be decompressed then it should be implemented as a stream handler (perhaps in cowboy_compress_h if enabled via an option, but maybe a separate stream handler would work best?).

But it's not always the case that you want to decompress, sometimes you want to store the compressed file directly, and the stream handler implementation would not help those use cases. In those cases calling zlib directly makes better sense.

So if your use case fits the stream handler case then sure a patch is welcome. This is what this ticket is about. But there has to be a few options to control the behavior (whether to decompress at all, perhaps options related to buffering). No need for limiting the uncompressed content since safeInflate does it for us and it doesn't look like it's configurable, plus on the read_body side the flow control will quickly kick in.

Anyway I'd recommend writing this stream handler in your code base and then submitting to Cowboy once everything is worked out, this doesn't require any changes to Cowboy directly I think, if done as a new stream handler. We can see about merging with cowboy_compress_h later on.

essen avatar Dec 14 '19 21:12 essen

@essen just checking if this was ever done? Is there a best practices for handling gzip requests for the latest version?

romanr321 avatar May 09 '22 20:05 romanr321

Not that I am aware of. You have to do it manually for now. Patches are welcome as long as they're in line with what was discussed above.

essen avatar May 09 '22 21:05 essen

A cowboy_decompress_h stream handler has been merged. This will be part of the next release, 2.11. It will not be enabled by default, but if enabled it will enable you to transparently decode gzipped request bodies (and only gzip, for now). Closing, thanks!

essen avatar Jan 04 '24 16:01 essen