Twisted does not properly strip whitespace from the ends of HTTP header values
The Bug
Twisted's HTTP/1.1 header parser does not properly strip trailing whitespace from header values.
Reproduction Steps
- Start a Twisted HTTP/1.1 server that echoes received header values, such as this one.
- Send it a request with a header that ends in one or more spaces or tabs, followed by one or more carriage returns (in addition to the CRLF pair that ends the header), and observe that the spaces/tabs are interprete as part of the header value:
printf 'GET / HTTP/1.1\r\nHost: a\r\nExtra-Whitespace-Here: whatever \t \t \t \r\r\n\r\n' | \
timeout 1 ncat --no-shutdown localhost 80 | \
grep '"headers"' | \
jq '.headers[1][1]' | \
xargs echo | \
base64 -d | \
od -tcx1
0000000 w h a t e v e r \t \t \t
77 68 61 74 65 76 65 72 20 09 20 09 20 09 20
0000017
Note that the spaces and tabs remain. (If you remove the extra carriage return, you'll notice that the spaces and tabs are stripped appropriately.)
Correct Behavior
From RFC 9112:
A field line value might be preceded and/or followed by optional whitespace (OWS); a single SP preceding the field line value is preferred for consistent readability by humans. The field line value does not include that leading or trailing whitespace: OWS occurring before the first non-whitespace octet of the field line value, or after the last non-whitespace octet of the field line value, is excluded by parsers when extracting the field line value from a field line.
Environment
Linux f85ab701084c 6.12.34-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 19 Jun 2025 15:05:14 +0000 x86_64 GNU/Linux
Debian 13
Hi Ben. Thank for the report
I am looking at the spec here https://www.rfc-editor.org/rfc/rfc9112#name-collected-abnf
I see at https://www.rfc-editor.org/rfc/rfc9110#name-whitespace that optional white space is only space and tab.
OWS = *( SP / HTAB )
; optional whitespace
RWS = 1*( SP / HTAB )
; required whitespace
I don't see the new line character as part of the white space specification.
Are you aware of other HTTP servers that treat new lines as whitspaces?
I am using this stdlib http server and I see that the whitespaces are not stripped.
import http.server
class MyHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
for header in self.headers:
print(header, ':', repr(self.headers[header]))
self.send_response(200)
self.end_headers()
self.wfile.write(b"Hello, world!")
server_address = ("127.0.0.1", 8000)
httpd = http.server.HTTPServer(server_address, MyHandler)
httpd.serve_forever()
I get
Host : 'a'
Extra-Whitespace-Here : 'whatever \t \t \t '
I don't see the new line character as part of the white space specification.
The bug I'm pointing out has to do with whitespace in a parsed header value, which is explicitly forbidden by the quoted text I provided above. I'm not suggesting that newlines (or carriage returns) be considered whitespace.
Allow me to state the problem more clearly. When Twisted receives a header value that ends with OWS followed by \r\r\n instead of \r\n, it fails to strip the OWS from the end of the header value.
For example, when Twisted receives the following:
GET / HTTP/1.1\r\n
Host: whatever\r\n
Test: whatever\t\r\r\n
\r\n
it interprets the Test header as having value whatever\t, which indicates a problem, because RFC 9112 states that leading or trailing OWS (that is, spaces or tabs) should never be found in a header value.
I am using this stdlib http server and I see that the whitespaces are not stripped.
Yep, this is a known bug in the stdlib HTTP server. That server is not intended for production use. It has known exploitable spec violations, and fair enough! That server is intended for simple testing use only.
Suggested Fixes
From RFC 9112, Section 2.2:
A sender MUST NOT generate a bare CR (a CR character not immediately followed by LF) within any protocol elements other than the content. A recipient of such a bare CR MUST consider that element to be invalid or replace each bare CR with SP before processing the element or forwarding the message.
This indicates that there are 2 different correct behaviors when a bare CR is encountered in an HTTP message:
- Reject the request due to the presence of a bare CR in a field-line. This is what most HTTP implementations do, including AIOHTTP, Apache httpd, Apache Tomcat, EmbedThis AppWeb, aws-c-http, ccp-httlib, Eclipse Jetty, FastHTTP, Go net/http, Gunicorn, H2O, HAProxy, Hyper, Hypercorn, Ktor, Libevent, Libmicrohttpd, Lighttpd, Mongoose, Netty, Node.js, protocol-http1, Puma, Tornado, Undertow, Uvicorn, Waitress, WEBrick, and Unicorn.
- Replace the bare CRs with spaces, then strip the spaces due to the RFC text I quoted in my first message. This is what Eclipse Grizzly, Libsoup, Nginx, and LiteSpeed do.
Twisted does neither of these, and instead behaves as we saw above. I'm not aware of any other production-level HTTP implementations that exhibit this behavior.
I am inclined to go with option 1.