BaseHTTPRequestHandler.parse_request() loses client-provided information
Bug report
The fix for https://github.com/python/cpython/issues/87389 prevents servers from handling request paths with multiple leading slashes.
For example, one might have a simple server that just reflects the request path:
from http.server import *
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(self.path.encode('latin1'))
with ThreadingHTTPServer(('127.0.0.1', 8000), MyHTTPRequestHandler) as server:
server.serve_forever()
Previously, this would faithfully mirror the request path from the client:
$ curl -v http://localhost:8000//test
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET //test HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.82.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.6 Python/3.8.13
< Date: Mon, 07 Nov 2022 21:51:19 GMT
<
* Closing connection 0
//test
But now it mangles it:
$ curl -v http://localhost:8000//test
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET //test HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.82.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.6 Python/3.11.0
< Date: Mon, 07 Nov 2022 21:51:30 GMT
<
* Closing connection 0
/test
This impacts any servers that subclass BaseHTTPRequestHandler, such as eventlet's WSGI server.
FWIW, this also impacts wsgiref, which seems not-great considering how it's supposed to be a reference implementation. This example has the same bad behavior as above, except with a different Server header:
from wsgiref.simple_server import make_server
def app(env, start_response):
start_response('200 OK', [])
return [env['PATH_INFO'].encode('ascii')]
with make_server('', 8000, app) as httpd:
httpd.serve_forever()
The path mangling is not documented. The change in question seems to be commit defaa2b1. Perhaps it should have only been made in the SimpleHTTPRequestHandler class, not BaseHTTPRequestHandler.