Fix: Mitigate urllib3 IPv6 Zone ID parsing bug
This Pull Request introduces a targeted mitigation in Request.prepare_url to prevent URL corruption caused by a bug in the dependency, urllib3.util.parse_url.
The Problem
When a standards-compliant link-local IPv6 address with a Zone ID is passed to requests (e.g., http://[fe80::a%2553]/), the following sequence of failures occurs:
-
Premature Decoding (in
urllib3):urllib3.util.parse_urlincorrectly decodes the Zone ID delimiter from the required URI format (%25) to a single percent sign (%). This leaves the host component in a corrupted state ([fe80::a%53]). It can also be the case mutlple calls ofurllib3.quoteandurllib3.unquotefurther change characters after the percent sign to its hexadecimal representation (%53 -> S). -
Downstream Failure: This corrupted host string fails to be processed correctly by the rest of the HTTP stack, leading to critical errors:
-
Connection Error:
OSError: [Errno 22] Invalid argumentinurllib3's socket layer, as the host string is improperly formatted for the OS socket API. -
Validation Error: In other
requestscomponents (like cookie handling), the parser may incorrectly decode the remaining%53as the character'S', leading toValueError.
-
Connection Error:
The Solution
This patch adds logic immediately after the parse_url call to check for, and repair, the corrupted host component.
- It checks for bracketed IPv6 addresses containing exactly one single percent sign (
%). - If found, it reconstructs the host by restoring the standards-compliant, fully-encoded delimiter (
%25).
This ensures that the final self.url is a canonical URI per RFC 6874, allowing the request to proceed successfully once the underlying urllib3 connection logic is fixed to correctly handle the Zone ID. This fix prevents immediate internal corruption errors within requests itself.
As this is a core connection bug in the urllib3 library, I am raising an issue there and am looking to submit a corresponding PR to fix the underlying socket handling.