cheroot icon indicating copy to clipboard operation
cheroot copied to clipboard

Cheroot's HTTP method parsing is too permissive

Open kenballus opened this issue 8 months ago • 0 comments

I'm submitting a ...

  • [X] 🐞 bug report
  • [ ] 🐣 feature request
  • [ ] ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior? RFC 9110 defines the following ABNF rules for HTTP methods:

method = token
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA

Cheroot allows all of the following characters within method names, even though they do not conform to the method ABNF rule from the HTTP RFCs:

  • \x00-\x09 inclusive
  • \x0b-\x1f inclusive
  • \x22,
  • \x28,
  • \x29,
  • \x2c,
  • \x2f-\x40 inclusive,
  • \x5b-\x5d inclusive,
  • \x7b,
  • \x7d,
  • \x7f-\xff inclusive.

What is the motivation / use case for changing the behavior? The current behavior violates the HTTP RFCs and can hinder interoperability. For example, caching gateways that interpret GET\r as equivalent to GET, but forward the method as-is, will potentially be vulnerable to cache poisoning when paired with Cheroot due to this behavior.

💡 To Reproduce

  1. Run the following script:
from base64 import b64encode

from cheroot.wsgi import Server, PathInfoDispatcher as WSGIPathInfoDispatcher

RESERVED_HEADERS = ("CONTENT_LENGTH", "CONTENT_TYPE")


def app(environ, start_response) -> list[bytes]:
    try:
        body: bytes = environ["wsgi.input"].read()
    except ValueError:
        start_response("400 Bad Request", [])
        return []
    response_body: bytes = (
        b'{"headers":['
        + b",".join(
            b'["'
            + b64encode(k.encode("latin1")[len("HTTP_") if k not in RESERVED_HEADERS else 0 :])
            + b'","'
            + b64encode(environ[k].encode("latin1"))
            + b'"]'
            for k in environ
            if k.startswith("HTTP_") or k in RESERVED_HEADERS
        )
        + b'],"body":"'
        + b64encode(body)
        + b'","version":"'
        + b64encode(environ["SERVER_PROTOCOL"].encode("latin1"))
        + b'","uri":"'
        + b64encode(
            (
                environ["PATH_INFO"] + (("?" + environ["QUERY_STRING"]) if environ["QUERY_STRING"] else "")
            ).encode("latin1")
        )
        + b'","method":"'
        + b64encode(environ["REQUEST_METHOD"].encode("latin1"))
        + b'"}'
    )
    start_response(
        "200 OK", [("Content-type", "application/json"), ("Content-Length", f"{len(response_body)}")]
    )
    return [response_body]

Server(("0.0.0.0", 80), WSGIPathInfoDispatcher({"/": app})).start()
  1. Send it an HTTP request using one of the above mentioned invalid characters within a method:
$ printf 'GET\x00 / HTTP/1.1\r\nHost: whatever\r\n\r\n' | nc localhost 80
  1. Observe that the request is accepted:
HTTP/1.1 200 OK
Content-type: application/json
Content-Length: 109
Date: Sat, 08 Jun 2024 15:56:59 GMT
Server: Cheroot/10.0.2.dev71+g1ff20b18

{"headers":[["SE9TVA==","d2hhdGV2ZXI="]],"body":"","version":"SFRUUC8xLjE=","uri":"Lw==","method":"R0VUAA=="}
  1. Base64-decode the method to see that the null byte is preserved:
$ printf 'R0VUAA==' | base64 -d | xxd
00000000: 4745 5400                                GET.

💡 Expected behavior Cheroot should respond 400 to requests with syntactically invalid methods. 501 is not an acceptable response code in this scenario because it is cacheable, so cache poisoning remains possible.

📋 Environment

  • Cheroot version: 10.0.2.dev71+g1ff20b18
  • Python version: 3.11.9
  • OS: Linux 6.9.1

kenballus avatar Jun 08 '24 16:06 kenballus