Incorrect Handling of Absolute-Form Request-Target Authority
Version bacbf8a
Platform Ubuntu 11.4.0-1ubuntu1~22.04
Description Hello, I may have found a bug in gunicorn's parsing of absolute-form request-target authority. RFC 9112 says this:
When an origin server receives a request with an absolute-form of request-target, the origin server MUST ignore the received
Hostheader field (if any) and instead use the host information of the request-target.
However, I noticed that when an HTTP request is sent with an absolute-form request-target where the authority differs from the Host header, gunicorn appears to prioritize the Host header over the request-target’s authority.
For example:
GET http://evil.com/page HTTP/1.1\r\n
Host: victim.com\r\n
\r\n
Gunicorn's response:
$ echo -ne "GET http://evil.com/page HTTP/1.1\r\nHost: victim.com\r\n\r\n" | nc 172.18.0.7 80
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 26 Mar 2025 09:01:53 GMT
Connection: keep-alive
Content-type: application/json
Content-Length: 113
{"headers":[["SE9TVA==","dmljdGltLmNvbQ=="]],"body":"","version":"SFRUUC8xLjE=","uri":"L3BhZ2U=","method":"R0VU"}
From the response body, specifically the URI, it appears that Gunicorn parses the above request as:
GET /page HTTP/1.1\r\n
Host: victim.com\r\n
\r\n
This shows that gunicorn did not ignore the received Host header field, which may be a violation of the protocol specifications.
Does this mean we also have to start caring about things like GET https:// HTTP/1.0\r\nHost: evil.example\r\n\r\n?
Side note: flask until recently did not appreciate putting colons in SERVER_NAME without wrapping in [v6addr] - that section needs review anyway.
Relevant code: https://github.com/benoitc/gunicorn/blob/a86ea1e4e6c271d1cd1823c7e14490123f9238fe/gunicorn/http/message.py#L443 https://github.com/benoitc/gunicorn/blob/a86ea1e4e6c271d1cd1823c7e14490123f9238fe/gunicorn/http/wsgi.py#L126-L127 https://github.com/benoitc/gunicorn/blob/a86ea1e4e6c271d1cd1823c7e14490123f9238fe/gunicorn/http/wsgi.py#L161-L162
Stdlib docs: https://docs.python.org/3/library/urllib.parse.html#url-parsing-security
The urlsplit() and urlparse() APIs do not perform validation of inputs
Internet Standard section: https://datatracker.ietf.org/doc/html/rfc9112#section-3.2.2
When an origin server receives a request with an absolute-form of request-target, the origin server MUST ignore the received Host header field (if any) and instead use the host information of the request-target. Note that if the request-target does not have an authority component, an empty Host header field will be sent in this case.
WSGI: https://peps.python.org/pep-3333/#environ-variables
SERVER_NAME and SERVER_PORT are required strings and must never be empty.
Related uri-parsing issue for authority-form: #3363
Related issue for missing Host: #3361
Thank you for your response.
I understand the concern about edge cases like GET https:// HTTP/1.0\r\nHost: evil.example\r\n\r\n. My initial intention was just to highlight that when a valid absolute-form request-target is provided, Gunicorn should comply with RFC 9112 by ignoring the Host header and using the request-target’s authority. If accommodating that RFC rule raises complications with unusual URI schemes or the WSGI server variables, I'm open to discussing possible approaches or workarounds.
Thank you for the detailed response and references!
Hello, I have the following finding. I deployed a proxy server (Apache, running on http://localhost:80) in front of Gunicorn to forward HTTP requests to Gunicorn.
In my test, Apache strictly adheres to the RFC 9112, which is:
When a proxy receives a request with an absolute-form of request-target, the proxy MUST ignore the received
Hostheader field (if any) and instead replace it with the host information of the request-target. A proxy that forwards such a request MUST generate a newHostfield value based on the received request-target rather than forward the receivedHostfield value.
$ echo -ne "GET http://evil.com/cache-test HTTP/1.1\r\nHost: victim.com\r\n\r\n" | nc localhost 80
HTTP/1.1 200 OK
Date: Thu, 27 Mar 2025 13:35:49 GMT
Server: waitress
Content-Length: 201
Content-Type: application/json
{"headers":[["SE9TVA==","ZXZpbC5jb20="],["WF9GT1JXQVJERURfU0VSVkVS","bG9jYWxob3N0"],["Q09OTkVDVElPTg==","S2VlcC1BbGl2ZQ=="]],"body":"","version":"SFRUUC8xLjE=","uri":"L2NhY2hlLXRlc3Q=","method":"R0VU"}
SE9TVA== -> HOST
ZXZpbC5jb20= -> evil.com
If a proxy only ignore the received Host header field but not generate a new Host field value based on the received request-target when forwarding, which means partially in violation of the RFC. Will there be an attack vector for the difference between this resolution between the proxy and gunicorn? Especially if the proxy has caching enabled.
@TUO-Wu No AI slop on the issue tracker, please.