gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

[Question] Behaviour of un-chunked request and CONTENT_LENGTH header

Open space88man opened this issue 2 years ago • 8 comments

With gunicorn 20.1.0 - when it receives a chunked request - the body is available to the application from wsgi.input but the headers are untouched: the application sees no CONTENT_LENGTH header.

Is this the intended behaviour? Django for example will return an empty body in this case.

OTH, waitress will convert the request so that it appears “to the client to be an entirely non-chunked HTTP”(waitress/parser.py#194) - this enables Django/rest_framework to work oblivious to the original request's transfer-encoding.

This was discovered in testing: a client is POST-ing with transfer-encoding: chunked: waitress + Django works but gunicorn + Django sees an empty request body.

With a gunicorn deployment is it expected that some frontend proxy will perform the unchunking and set Content-Length?

space88man avatar Feb 15 '23 23:02 space88man

Yes, this is intentional. Transfer-Encoding: chunked must not set a content length. This is what the wsgi.input_terminated extension to the spec is for, Gunicorn and other WSGI servers set it to indicate that they handle terminating the stream and that it's safe to read. Django does not use this, it does not support streaming request bodies. Waitress always buffers the request body, so it can unset the chunked encoding and set a length, at the cost of more memory.

davidism avatar Mar 10 '23 23:03 davidism

so...

I have an old application wrote in Python 2 (Gunicorn + Django), that was receiving chunked requests successfully.

My previous setup worked because I had NGINX before Gunicorn. Now with Traefik, Gunicorn "alone" is receiving empty bodies.

I'm going to try Waitress instead of Gunicorn (for Python 2) or put NGINX in between. 😞

@space88man Do you had any chance to make it work somehow changing params here and there? (2 days struggling)

marcomilone avatar Dec 21 '23 13:12 marcomilone

@marcomilone this might be a good test case to validate some outstanding patches in that area against. Please go ahead and share a full reproducer, or at least a full request that you expected to work but which failed, and how the same request looks when received through the proxy.

pajod avatar Dec 21 '23 14:12 pajod

@pajod

8 years ago I developed an Android Cordova WebApp running as APK on some devices. Still runs with success. I updated it in first years, now I prefer to leave it as is.

This app talks with a backend using NGINX + Gunicorn + Django .

I don't know how to replicate it because is a kind of edge case. I used, for file uploads, this: https://www.npmjs.com/package/cordova-plugin-file-transfer/v/1.7.1?activeTab=versions

Maybe is that request somehow bugged. I cannot see anything in the Network Pane, because that requests are managed by the plugin and not by the webview and the plugin talks back with to the app with some callbacks. I know, was big mistake using this plugin.

In view of some upgrades, I want to change the backend, introducing Traefik, instead of NGINX.

I'm not so good at debugging all this stack, even printing the whole request(s ... are chunked)

I'm stuck with Python 2, using Gunicorn 19.10 .

Wanted setup - Traefik

Traefik error is: httputil: ReverseProxy read error during body copy: read tcp 192.168.208.1:56872->192.168.208.4:8000: use of closed network connection

I'm pretty sure that the chunked request (the first chunk only probably) is forwarded as is to Gunicorn, that forward it to Django View. The request arrives completely empty.

Is not an option for Traefik to do that work (I digged a whole day in the docs)

Previous working setup - NGINX

I think that NGINX collected the chunks in a complete request and send it with a content-length to Gunicorn .


I wrote this wall of text to describe my situation and give some more informations for future readers.

I tried Waitress-v1.4.4, with no success. I'm gonna put NGINX back, between Traefik and Gunicorn. (hoping will work)

Thanks for reading,

marcomilone avatar Dec 22 '23 09:12 marcomilone

Behaviour of gunicorn didn't change since its creation . We stream the body and do not keep it it memory. NGINX doesn't collect the chunk that for sure. The nice addition of wsgi.input_terminated to the spec made it more relax. Can you eventually share with me a pcap using tcpdump ?

As a side note traeffik may have an issue with chunked encoding: https://github.com/traefik/traefik/issues/9775

If you want to collect the whole body in your application you can add an intermediate wgi middleware that does it. Or modify the application to iterate the body.

benoitc avatar Dec 22 '23 10:12 benoitc

@benoitc don't kill me, how do I do it?

I log into the container running Gunicorn, I start the dump, I stop the dump.

What command should I run ?

tcpdump -i <interface> -s 65535 -w <file>   ?

`You need the incoming traffic from Traefik or the outgoing to Django?

Thanks,

marcomilone avatar Dec 22 '23 21:12 marcomilone

just a dump of your interface and maybe fiilter. But yes that is mostly the command line :)

benoitc avatar Dec 23 '23 01:12 benoitc

Don't kill me, again. Mysteriously the Traefik+Gunicorn option started to work, while I was starting the tcpdump, right before hitting Enter.

I admit that I'm somehow scared and ashamed, because I already tried some times in the past month (It's a side project) with the same error, and because I don't have any clues on why wasn't working.

edit: I checked with some new data and Traefik+Gunicorn are still working flawlessly, even with the chuncked requests. I'm so sorry to have disturbed you.

In case this problem pop up again I will introduce myself with a good tcpdump. Thanks (and sorry) for your time.

marcomilone avatar Dec 31 '23 21:12 marcomilone