gunicorn Cannot return a response with Transfer-Encoding:gzip

Gunicorn removes the Transfer-Encoding header set in start_response(). Unless a Content-Length header is given, it then re-adds its own Transfer-Encoding header with value chunked.

I would expect this header to stay intact. The use case is to stream a compressed file directly from disk. I known I can use Content-Encoding, but would like to avoid that for silly client compatibility reasons.

def app(environ, start_response):
    # case 1: TE:gzip is removed, and TE:chunked added
    start_response("200 OK", [('Transfer-Encoding', "gzip")])

    # case 2: TE:gzip is removed, no TE header in final response
    #start_response("200 OK", [('Transfer-Encoding', "gzip"), ("content-length", "21852")])

    # case 3: avoid use of TE:gzip, this works as expected.
    #start_response("200 OK", [('Content-Encoding', "gzip")])


    # create sample file with e.g `dmesg | gzip > dmesg.gz`
    with open("dmesg.gz", "rb") as f:
        yield f.read()

Oct 04 '22 09:10 uedvt359

I am not aware of gunicorn removing headers. What does your start_response function?

Oct 14 '22 07:10 benoitc

It's the one from gunicorn, so I presume this: https://github.com/benoitc/gunicorn/blob/0b953b803786997d633d66c0f7c7b290df75e07c/gunicorn/http/wsgi.py#L223

Oct 14 '22 07:10 uedvt359

cna you provide the full request trace?

Oct 14 '22 07:10 benoitc

You mean like this?

Case 1:

11:04 xxx-linux /tmp/tmp.vrT8SxTCdh % curl -v http://localhost:8000         
* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,10.0.0.0/8,127.0.0.1, xxx'
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Fri, 14 Oct 2022 09:04:50 GMT
< Connection: close
< Transfer-Encoding: chunked
<

Case 2:

11:04 xxx-linux /tmp/tmp.vrT8SxTCdh % curl -v http://localhost:8000    
* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,10.0.0.0/8,127.0.0.1, xxx'
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Fri, 14 Oct 2022 09:05:31 GMT
< Connection: close
< content-length: 21852
<

Case 3:

11:05 xxx-linux /tmp/tmp.vrT8SxTCdh % curl -v http://localhost:8000    
* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,10.0.0.0/8,127.0.0.1, xxx'
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: gunicorn
< Date: Fri, 14 Oct 2022 09:05:56 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Encoding: gzip
<

Oct 14 '22 09:10 uedvt359

thanks for the traces.

Oct 18 '22 13:10 benoitc

I had another look at the source of start_response. it calls process_headers, which eventually drops all hop-by-hop headers (which transfer-encoding is one of).

Just dropping this header yields the wrong result. That said, I'm not sure what the http spec wants us to do here. The RFC talks about how transfer-encoding must be handled by proxies, but I don't think gunicorn counts as a proxy.

What can gunicorn do here, practically? There are two uses of this header (when used as a response header): first, it signals an indefinite length response with chunked. second, it signals the application of compression, e.g. with gzip or deflate. gunicorn already adds chunked when it cannot determine response size; this is correct.

Let's look at the compression case: I see two ways to implement that: Either look at an incoming transfer-encoding header, and decode the message (possible re-encoding it when sending it onwards), or to just pass the message through unmodified and merge our transfer-encoding:chunked with the incoming one. I would prefer the latter, since it is less (computational and implementation) work and would volunteer to provide a patch if that idea is acceptable.

In summary: gunicorn right now neither decodes the message body when encountering a transfer-encoding header, nor does it pass it on to the client. Instead, it explicitly drops it. I would suggest forwarding the header, possibly applying chunked encoding on top of it.

note to self: if this is to be implemented, add a self.compressed or self.transfer_encoding (like the existing self.chunked) member variable. when processing headers, check for transfer-encoding and store it. might have to handle already-chunked messages specially. finally, when adding default headers, check not only self.chunked but our new member var as well and add a transfer-encoding header with all necessary codings in the right order.

Oct 27 '22 07:10 uedvt359

gunicorn gunicorn copied to clipboard

Cannot return a response with Transfer-Encoding:gzip

gunicorn
gunicorn copied to clipboard