hackney icon indicating copy to clipboard operation
hackney copied to clipboard

There is no way to specify an upper bound for the chunk size.

Open sumerman opened this issue 8 years ago • 4 comments

When hackney consumes a chunked response, it always waits until an entire chunk is available before sending it to a client process even when used asynchronously. I do understand why it might be desirable in many cases, yet sometimes, it might be either absurd or even a security threat.

Consider a service that produces large chunked responses behind Nginx configured as in the following snippet:

worker_processes  1;

events {
    worker_connections  1024;
}

http {
    default_type  application/octet-stream;

    sendfile        on;
    keepalive_timeout  65;
    proxy_cache_path /tmp/nginx/mycache levels=1:2 keys_zone=mycache:100m inactive=10m max_size=3g use_temp_path=off;

    server {
        listen       8080;
        server_name  localhost;

	proxy_set_header Host $http_host;
	proxy_read_timeout 900s;

	proxy_cache_lock on;
	proxy_cache_lock_timeout 300s;

	proxy_cache_valid any 10s;

	add_header X-Cached $upstream_cache_status;

        location / {
            root   html;
            index  index.html index.htm;
        }

        location /test {
	    proxy_cache  mycache;
            proxy_pass   http://127.0.0.1:8090/;
        }

    }
}

For cache misses, chunks in a response might get merged, but for the hits, Nginx responds with a single mega-chunk containing the entire response (I checked that with nginx/1.10.2).

The described behaviour makes no sense for proxy-like workloads: why would anyone want to accumulate 1GB of chunked data before proxying a single byte? To make matters worse — it might be exploited by a malicious upstream service to put an app that uses hackney out of memory. Finally, together with #378 it makes for the deadly duet.

I suggest introducing an option for an upper bound for a chunk size. An available part of a chunk must be sent to a calling process if it's bigger than allowed by the upper bound, effectively splitting the chunk in two. I also suggest picking a reasonable default wich is not an infinity.

For now, we solved the issue by switching to ibrowse that allows specifying what behaviour is desired.

sumerman avatar Jan 11 '17 21:01 sumerman

do you mean receiving immediately a partial chunk? Normally it is expected to receive the whole chunk , not really for security reason but because some API just use it as a way to stream chunks (with metadata in headers). ( I would also not have expected to get a chunk that large ;)

Anyway we could have an option for it. The default would be waiting for the chunk and raises an issue if it's over a limit, so we wouldn't break the expectations. Thoughts?

benoitc avatar Jan 11 '17 22:01 benoitc

I suggest to send a chunk as soon as either it's complete or bigger than a certain threshold. The threshold value should be passed as an option. The default might be either infinity (any integer is smaller than an atom) or a sufficiently large integer value.

sumerman avatar Jan 13 '17 21:01 sumerman

In the case of sending an incomplete chunk the same logic applies for its leftovers: either it gets split again or the remainder is smaller than the threshold and gets send.

sumerman avatar Jan 13 '17 21:01 sumerman

I'm suffering this right now. For whatever reason the server I try to download a file from sends a unique chunk of 80 Mb. It takes about 10 minutes and 1.0Gb RAM to hackney to build the entire chunk. Forcing elixir app restart all the time, because all available memory is eaten. The "10 minutes" part obviously is my fault (I'm in cuba, which has a bad connection), but 1.5Gb of RAM used to build a 80Mb chunk!!! I think this (specifying upper limits to chunk size) is a great idea to avoid that, but It's not happening, right?

JoeZ99 avatar Apr 22 '22 22:04 JoeZ99