cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-96035: Make urllib.parse.urlparse reject non-numeric ports

Open kenballus opened this issue 3 years ago • 1 comments

urllib.parse.urlparse uses int to parse port numbers, which means they can contain signs, underscores, and whitespace. This patch adds a check to ensure that only numeric port numbers parse without error.

  • Issue: gh-96035

kenballus avatar Oct 15 '22 01:10 kenballus

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

bedevere-bot avatar Oct 15 '22 18:10 bedevere-bot

@JelleZijlstra

I think it makes way more sense to move port validation entirely to parse time, a la #25774, so it might not make sense to implement these fixes until that happens. What's the protocol for PRs that depend on other PRs? Should I make a PR into @gpshead's fork, or would you rather I submit this in isolation and have other adjust for it?

Thanks!

kenballus avatar Oct 17 '22 15:10 kenballus

I'd also want to hear @gpshead's opinion, and I haven't looked at #25774 in enough detail to understand the problem it's trying to solve. However, the other PR has been open for a long time and looks like a more invasive change, so we may want to apply it only on main, while this PR is a more self-contained bugfix that we'll be able to backport to 3.10 and 3.11.

JelleZijlstra avatar Oct 17 '22 15:10 JelleZijlstra

There's another case to consider: Unicode digits! For example, your patch still allows a port of , which is the Devanagari character for 6. Your quote from the RFC on the issue says only ASCII digits are allowed.

Great catch! I had no idea that things like "६".isdigit() == True! I'll fix that right away.

kenballus avatar Oct 20 '22 02:10 kenballus

Thanks @kenballus for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. 🐍🍒⛏🤖

miss-islington avatar Oct 20 '22 21:10 miss-islington

Thanks @kenballus for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11. 🐍🍒⛏🤖

miss-islington avatar Oct 20 '22 21:10 miss-islington

GH-98498 is a backport of this pull request to the 3.10 branch.

bedevere-bot avatar Oct 20 '22 21:10 bedevere-bot

GH-98499 is a backport of this pull request to the 3.11 branch.

bedevere-bot avatar Oct 20 '22 21:10 bedevere-bot