cheroot
cheroot copied to clipboard
Cheroot's HTTP method parsing is too permissive
❓ I'm submitting a ...
- [X] 🐞 bug report
- [ ] 🐣 feature request
- [ ] ❓ question about the decisions made in the repository
🐞 Describe the bug. What is the current behavior? RFC 9110 defines the following ABNF rules for HTTP methods:
method = token
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
Cheroot allows all of the following characters within method names, even though they do not conform to the method ABNF rule from the HTTP RFCs:
-
\x00
-\x09
inclusive -
\x0b
-\x1f
inclusive -
\x22
, -
\x28
, -
\x29
, -
\x2c
, -
\x2f
-\x40
inclusive, -
\x5b
-\x5d
inclusive, -
\x7b
, -
\x7d
, -
\x7f
-\xff
inclusive.
❓ What is the motivation / use case for changing the behavior?
The current behavior violates the HTTP RFCs and can hinder interoperability. For example, caching gateways that interpret GET\r
as equivalent to GET
, but forward the method as-is, will potentially be vulnerable to cache poisoning when paired with Cheroot due to this behavior.
💡 To Reproduce
- Run the following script:
from base64 import b64encode
from cheroot.wsgi import Server, PathInfoDispatcher as WSGIPathInfoDispatcher
RESERVED_HEADERS = ("CONTENT_LENGTH", "CONTENT_TYPE")
def app(environ, start_response) -> list[bytes]:
try:
body: bytes = environ["wsgi.input"].read()
except ValueError:
start_response("400 Bad Request", [])
return []
response_body: bytes = (
b'{"headers":['
+ b",".join(
b'["'
+ b64encode(k.encode("latin1")[len("HTTP_") if k not in RESERVED_HEADERS else 0 :])
+ b'","'
+ b64encode(environ[k].encode("latin1"))
+ b'"]'
for k in environ
if k.startswith("HTTP_") or k in RESERVED_HEADERS
)
+ b'],"body":"'
+ b64encode(body)
+ b'","version":"'
+ b64encode(environ["SERVER_PROTOCOL"].encode("latin1"))
+ b'","uri":"'
+ b64encode(
(
environ["PATH_INFO"] + (("?" + environ["QUERY_STRING"]) if environ["QUERY_STRING"] else "")
).encode("latin1")
)
+ b'","method":"'
+ b64encode(environ["REQUEST_METHOD"].encode("latin1"))
+ b'"}'
)
start_response(
"200 OK", [("Content-type", "application/json"), ("Content-Length", f"{len(response_body)}")]
)
return [response_body]
Server(("0.0.0.0", 80), WSGIPathInfoDispatcher({"/": app})).start()
- Send it an HTTP request using one of the above mentioned invalid characters within a method:
$ printf 'GET\x00 / HTTP/1.1\r\nHost: whatever\r\n\r\n' | nc localhost 80
- Observe that the request is accepted:
HTTP/1.1 200 OK
Content-type: application/json
Content-Length: 109
Date: Sat, 08 Jun 2024 15:56:59 GMT
Server: Cheroot/10.0.2.dev71+g1ff20b18
{"headers":[["SE9TVA==","d2hhdGV2ZXI="]],"body":"","version":"SFRUUC8xLjE=","uri":"Lw==","method":"R0VUAA=="}
- Base64-decode the method to see that the null byte is preserved:
$ printf 'R0VUAA==' | base64 -d | xxd
00000000: 4745 5400 GET.
💡 Expected behavior Cheroot should respond 400 to requests with syntactically invalid methods. 501 is not an acceptable response code in this scenario because it is cacheable, so cache poisoning remains possible.
📋 Environment
- Cheroot version: 10.0.2.dev71+g1ff20b18
- Python version: 3.11.9
- OS: Linux 6.9.1