gh-96035: Make urllib.parse.urlparse reject non-numeric ports
urllib.parse.urlparse uses int to parse port numbers, which means they can contain signs, underscores, and whitespace. This patch adds a check to ensure that only numeric port numbers parse without error.
- Issue: gh-96035
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.
Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.
@JelleZijlstra
I think it makes way more sense to move port validation entirely to parse time, a la #25774, so it might not make sense to implement these fixes until that happens. What's the protocol for PRs that depend on other PRs? Should I make a PR into @gpshead's fork, or would you rather I submit this in isolation and have other adjust for it?
Thanks!
I'd also want to hear @gpshead's opinion, and I haven't looked at #25774 in enough detail to understand the problem it's trying to solve. However, the other PR has been open for a long time and looks like a more invasive change, so we may want to apply it only on main, while this PR is a more self-contained bugfix that we'll be able to backport to 3.10 and 3.11.
There's another case to consider: Unicode digits! For example, your patch still allows a port of
६, which is the Devanagari character for 6. Your quote from the RFC on the issue says only ASCII digits are allowed.
Great catch! I had no idea that things like "६".isdigit() == True! I'll fix that right away.
Thanks @kenballus for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. 🐍🍒⛏🤖
Thanks @kenballus for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11. 🐍🍒⛏🤖
GH-98498 is a backport of this pull request to the 3.10 branch.
GH-98499 is a backport of this pull request to the 3.11 branch.