gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Discrepancies in parsing HTTP requests

Open blessingcharles opened this issue 3 years ago • 1 comments

Current Behavior

  1. Invalid HTTP versions got accepted matching the regex [\d.\d]+ eg : 0.2.1.0.0.3.0.0.91
  2. plus sign before content length value got accepted , it got transformed as +33 to 33 and the body is interpreted as length of 33
  3. Transfer-Encoding header got accepted in version 0.9 and 1.0 which is not available according to RFC spec
  4. \x0b \x0c \x1c \x1d \x1e \x1f are accepted between field-value and colon seperator , but according to RFC whitespaces where only allowed .

Expected Behavior

  1. According to RFC Protocol versioning DIGIT "." DIGIT is the ABNF grammar , it should be followed to parse the requests
  2. According to RFC Content Length 1*DIGIT is the ABNF grammar
  3. For 0.9 and 1.0 chunked encoding should not be interpreted , as its undefined in RFC spec of http 0.9 and 1.0 , but its used over content-length in gunicorn .
  4. According to RFC Header fields , header-field = field-name ":" OWS field-value OWS is the ABNF grammar but if we give \x0b \x0c \x1c \x1d \x1e \x1f instead of white spaces its also get treated normally. So if a proxy ignores [\x1c]chunked and forwards to downstream by parsing content-length and gunicorn incorrectly parses the [\x1c]chunked as chunked request it may lead to http request smuggling

Steps to Reproduce (for bugs)

  1. In HTTP-Version , any regex matching [\d.\d]+
echo -ne "GET / HTTP/0.0.0.0.3.0.0.91\r\nContent-Length: 3\r\n\r\naaa" | nc localhost 8002
    • sign accepted in content length field value
echo -ne "GET / HTTP/1.1\r\nContent-Length: +3\r\n\r\naaa" | nc localhost 8002
  1. For 0.9 and 1.0 chunked encoding should not be interpreted but its used over content-length
echo -ne "GET / HTTP/0.9\r\nHost: localhost\r\nTransfer-Encoding: chunked\r\nContent-Length: 13\r\n\r\n2\r\naa\r\n0\r\n\r\nX" | nc localhost 8002
  1. \x0b \x0c \x1c \x1d \x1e \x1f are accepted between field-value and colon seperator
  • In content Length
echo -ne "GET / HTTP/1.1\r\nContent-Length:\x0c3\r\n\r\naaa" | nc localhost 8002
  • In Transfer Encoding
echo -ne "GET / HTTP/1.1\r\nHost: localhost\r\nTransfer-Encoding:\x1cchunked\r\n\r\n3\r\naaa\r\n0\r\n\r\n" | nc localhost 8002

Environment

Gunicorn version : gunicorn 20.1.0

  • Docker Environment used
FROM python
RUN pip install gunicorn
WORKDIR /app
COPY app.py .
CMD ["gunicorn", "-b", "0.0.0.0:80", "app:app"]
  • Simple python server
def app(environ, start_response):
    body = environ["wsgi.input"].read()
    data = b"Body length: " + str(len(body)).encode() + b" Body: " + repr(body).encode()
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(data)))
    ])
    return iter([data])

blessingcharles avatar May 30 '22 06:05 blessingcharles

I will check, but according to the code it shouldn't be, order looks correct. As for For 0.9 and 1.0 chunked encoding should not be interpreted , as its undefined in RFC spec of http 0.9 and 1.0 , but its used over content-length in gunicorn . Some clients were expecting that differently.

benoitc avatar Aug 06 '22 16:08 benoitc