gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Incorrect Handling of Absolute-Form Request-Target Authority

Open TUO-Wu opened this issue 9 months ago • 4 comments

Version bacbf8a

Platform Ubuntu 11.4.0-1ubuntu1~22.04

Description Hello, I may have found a bug in gunicorn's parsing of absolute-form request-target authority. RFC 9112 says this:

When an origin server receives a request with an absolute-form of request-target, the origin server MUST ignore the received Host header field (if any) and instead use the host information of the request-target.

However, I noticed that when an HTTP request is sent with an absolute-form request-target where the authority differs from the Host header, gunicorn appears to prioritize the Host header over the request-target’s authority. For example:

GET http://evil.com/page HTTP/1.1\r\n
Host: victim.com\r\n
\r\n

Gunicorn's response:

$ echo -ne "GET http://evil.com/page HTTP/1.1\r\nHost: victim.com\r\n\r\n" | nc 172.18.0.7 80
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 26 Mar 2025 09:01:53 GMT
Connection: keep-alive
Content-type: application/json
Content-Length: 113

{"headers":[["SE9TVA==","dmljdGltLmNvbQ=="]],"body":"","version":"SFRUUC8xLjE=","uri":"L3BhZ2U=","method":"R0VU"}

From the response body, specifically the URI, it appears that Gunicorn parses the above request as:

GET /page HTTP/1.1\r\n
Host: victim.com\r\n
\r\n

This shows that gunicorn did not ignore the received Host header field, which may be a violation of the protocol specifications.

TUO-Wu avatar Mar 26 '25 09:03 TUO-Wu

Does this mean we also have to start caring about things like GET https:// HTTP/1.0\r\nHost: evil.example\r\n\r\n?

Side note: flask until recently did not appreciate putting colons in SERVER_NAME without wrapping in [v6addr] - that section needs review anyway.


Relevant code: https://github.com/benoitc/gunicorn/blob/a86ea1e4e6c271d1cd1823c7e14490123f9238fe/gunicorn/http/message.py#L443 https://github.com/benoitc/gunicorn/blob/a86ea1e4e6c271d1cd1823c7e14490123f9238fe/gunicorn/http/wsgi.py#L126-L127 https://github.com/benoitc/gunicorn/blob/a86ea1e4e6c271d1cd1823c7e14490123f9238fe/gunicorn/http/wsgi.py#L161-L162

Stdlib docs: https://docs.python.org/3/library/urllib.parse.html#url-parsing-security

The urlsplit() and urlparse() APIs do not perform validation of inputs

Internet Standard section: https://datatracker.ietf.org/doc/html/rfc9112#section-3.2.2

When an origin server receives a request with an absolute-form of request-target, the origin server MUST ignore the received Host header field (if any) and instead use the host information of the request-target. Note that if the request-target does not have an authority component, an empty Host header field will be sent in this case.

WSGI: https://peps.python.org/pep-3333/#environ-variables

SERVER_NAME and SERVER_PORT are required strings and must never be empty.

Related uri-parsing issue for authority-form: #3363 Related issue for missing Host: #3361

pajod avatar Mar 26 '25 12:03 pajod

Thank you for your response. I understand the concern about edge cases like GET https:// HTTP/1.0\r\nHost: evil.example\r\n\r\n. My initial intention was just to highlight that when a valid absolute-form request-target is provided, Gunicorn should comply with RFC 9112 by ignoring the Host header and using the request-target’s authority. If accommodating that RFC rule raises complications with unusual URI schemes or the WSGI server variables, I'm open to discussing possible approaches or workarounds. Thank you for the detailed response and references!

TUO-Wu avatar Mar 26 '25 14:03 TUO-Wu

Hello, I have the following finding. I deployed a proxy server (Apache, running on http://localhost:80) in front of Gunicorn to forward HTTP requests to Gunicorn.

In my test, Apache strictly adheres to the RFC 9112, which is:

When a proxy receives a request with an absolute-form of request-target, the proxy MUST ignore the received Host header field (if any) and instead replace it with the host information of the request-target. A proxy that forwards such a request MUST generate a new Host field value based on the received request-target rather than forward the received Host field value.

$ echo -ne "GET http://evil.com/cache-test HTTP/1.1\r\nHost: victim.com\r\n\r\n" | nc localhost 80
HTTP/1.1 200 OK
Date: Thu, 27 Mar 2025 13:35:49 GMT
Server: waitress
Content-Length: 201
Content-Type: application/json

{"headers":[["SE9TVA==","ZXZpbC5jb20="],["WF9GT1JXQVJERURfU0VSVkVS","bG9jYWxob3N0"],["Q09OTkVDVElPTg==","S2VlcC1BbGl2ZQ=="]],"body":"","version":"SFRUUC8xLjE=","uri":"L2NhY2hlLXRlc3Q=","method":"R0VU"}

SE9TVA== -> HOST ZXZpbC5jb20= -> evil.com

If a proxy only ignore the received Host header field but not generate a new Host field value based on the received request-target when forwarding, which means partially in violation of the RFC. Will there be an attack vector for the difference between this resolution between the proxy and gunicorn? Especially if the proxy has caching enabled.

TUO-Wu avatar Mar 27 '25 15:03 TUO-Wu

@TUO-Wu No AI slop on the issue tracker, please.

pajod avatar Mar 27 '25 16:03 pajod