gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Cannot return a response with Transfer-Encoding:gzip

Open uedvt359 opened this issue 2 years ago • 6 comments

Gunicorn removes the Transfer-Encoding header set in start_response(). Unless a Content-Length header is given, it then re-adds its own Transfer-Encoding header with value chunked.

I would expect this header to stay intact. The use case is to stream a compressed file directly from disk. I known I can use Content-Encoding, but would like to avoid that for silly client compatibility reasons.

def app(environ, start_response):
    # case 1: TE:gzip is removed, and TE:chunked added
    start_response("200 OK", [('Transfer-Encoding', "gzip")])

    # case 2: TE:gzip is removed, no TE header in final response
    #start_response("200 OK", [('Transfer-Encoding', "gzip"), ("content-length", "21852")])

    # case 3: avoid use of TE:gzip, this works as expected.
    #start_response("200 OK", [('Content-Encoding', "gzip")])


    # create sample file with e.g `dmesg | gzip > dmesg.gz`
    with open("dmesg.gz", "rb") as f:
        yield f.read()

uedvt359 avatar Oct 04 '22 09:10 uedvt359

I am not aware of gunicorn removing headers. What does your start_response function?

benoitc avatar Oct 14 '22 07:10 benoitc

It's the one from gunicorn, so I presume this: https://github.com/benoitc/gunicorn/blob/0b953b803786997d633d66c0f7c7b290df75e07c/gunicorn/http/wsgi.py#L223

uedvt359 avatar Oct 14 '22 07:10 uedvt359

cna you provide the full request trace?

benoitc avatar Oct 14 '22 07:10 benoitc

You mean like this?

Case 1:

11:04 xxx-linux /tmp/tmp.vrT8SxTCdh % curl -v http://localhost:8000         
* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,10.0.0.0/8,127.0.0.1, xxx'
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Fri, 14 Oct 2022 09:04:50 GMT
< Connection: close
< Transfer-Encoding: chunked
< 

Case 2:

11:04 xxx-linux /tmp/tmp.vrT8SxTCdh % curl -v http://localhost:8000    
* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,10.0.0.0/8,127.0.0.1, xxx'
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Fri, 14 Oct 2022 09:05:31 GMT
< Connection: close
< content-length: 21852
< 

Case 3:

11:05 xxx-linux /tmp/tmp.vrT8SxTCdh % curl -v http://localhost:8000    
* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,10.0.0.0/8,127.0.0.1, xxx'
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Fri, 14 Oct 2022 09:05:56 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Encoding: gzip
< 

uedvt359 avatar Oct 14 '22 09:10 uedvt359

thanks for the traces.

benoitc avatar Oct 18 '22 13:10 benoitc

I had another look at the source of start_response. it calls process_headers, which eventually drops all hop-by-hop headers (which transfer-encoding is one of).

Just dropping this header yields the wrong result. That said, I'm not sure what the http spec wants us to do here. The RFC talks about how transfer-encoding must be handled by proxies, but I don't think gunicorn counts as a proxy.

What can gunicorn do here, practically? There are two uses of this header (when used as a response header): first, it signals an indefinite length response with chunked. second, it signals the application of compression, e.g. with gzip or deflate. gunicorn already adds chunked when it cannot determine response size; this is correct.

Let's look at the compression case: I see two ways to implement that: Either look at an incoming transfer-encoding header, and decode the message (possible re-encoding it when sending it onwards), or to just pass the message through unmodified and merge our transfer-encoding:chunked with the incoming one. I would prefer the latter, since it is less (computational and implementation) work and would volunteer to provide a patch if that idea is acceptable.

In summary: gunicorn right now neither decodes the message body when encountering a transfer-encoding header, nor does it pass it on to the client. Instead, it explicitly drops it. I would suggest forwarding the header, possibly applying chunked encoding on top of it.


note to self: if this is to be implemented, add a self.compressed or self.transfer_encoding (like the existing self.chunked) member variable. when processing headers, check for transfer-encoding and store it. might have to handle already-chunked messages specially. finally, when adding default headers, check not only self.chunked but our new member var as well and add a transfer-encoding header with all necessary codings in the right order.

uedvt359 avatar Oct 27 '22 07:10 uedvt359