httpx
httpx copied to clipboard
Drop `rfc3986` requirement.
This pull request drops the rfc3986 package dependancy, in favour of a carefully worked through urlparse implementation, that provides everything we need in terms of URL validation and normalisation.
Closes #1833 (Fixed) Closes #2169 (No longer required) Closes #2175 (No longer required)
Marking this up as ready-for-review now.
Although this adds some additional code, it also completely removes a dependancy, and is lower overall-complexity. In terms of reading through and being able to understand the split between the URL model and the underlying URL parsing code, I think it ends up actually being much simpler.
I think the rfc3986 package is fantastic, but we were having to do a few bits in some slightly kludgy ways to use it, and without the indirection I find it much clearer to work all the way through what's actually going on here.
Would be very happy to work through a review here step by step until we're confident that:
- We've ensured that the parsing here is (relatively) easy enough for someone else to work through.
- We've ensured that it's throughly enough source-documented.
- We've ensured that any concerns about confidence-in-correctness have been thoroughly addressed.
Potentially contentious, because because of the "rely on existing packages" argument, but I'd like to work through this with someone else. To my eyes it's actually a nice rationalisation. But then I've spent the time working on it, so?
I might end up putting on my admin hat and pushing this one through. Anyone interested is welcome to review it.
I looked over the change. I agree that this implementation is very clean and easy to grok at first glance. Given httpx is an HTTP client, parsing URLs seems like a pretty fundamental behavior and I am not surprised that we benefit from doing it internally instead of relying on a 3rd party package. I'd be a lot more concerned if this was an app or library that only tangentially does HTTP stuff.
All this said, what I cannot evaluate is the correctness: it looks good at a glance, but I don't know the ins and outs of URL parsing / RFCs in the area, so for that I would have to defer to someone with more experience.
Very nice work! I've been looking through the code over time and letting it soak in. Only thing that jumped out at me right away is there is no "authority" field. This field is mentioned fairly prominently in rfc3986, so IMHO it's worth having as a field/@property. Easy to add.
I ran these through some examples in my URL/URI torture test I've been working on, and collecting examples for around the net, and found a couple of discrepancies that I think are worth looking at.
~~in particular, it seems to struggle with rfc4151 "tag" scheme URIs:~~ edit: disregard the above, there was a bug in my reference data.
There's also a batch of valid urls (taken from here) that don't parse into IDNA hostname correctly (throws Invalid IDNA hostname error), and some invalid URIs that ought to be throwing some kind of error, but nonetheless pass.
These ones are particularly spicy:
[
"http://✪df.ws/123",
"http://➡.ws/䨹",
"http://⌘.ws",
"http://⌘.ws/",
"http://☺.damowmow.com/"
]
The current PR gives "Invalid IDNA hostname", the Live URL Viewer converts these to Punycoded IDNA hostnames, not sure who is right here.
So with those odd URIs, they do have parity with rfc3986 library, which is probably what we want for this refactor, as we want it to be as transparent as possible. On the whole, it looks like this refactor does exactly that, which is excellent. Couch that in, "URIs/URLs are hard and there are millions of corner cases" but I think this does a great job at preserving the rfc3986-ness while adding much more flexibility.
Seems the only discrepancy now is merely between this implementation and the WHATWG "living URL standard" (as an aside, I'm personally not even convinced that spec is the One True Spec, but that's a holy war for another day). Especially with some of the IDNA stuff, I think handling of unicode URLs could be a bit more graceful. E.g. the rfc3986 lib makes a distinction between URI and IRI, and I don't think that's been pertinent in over a decade. I would expect httpx-1.0 to be able to take a unicode URL and just know the correct thing to do with it. Sounds like a great opportunity for a follow up PR though!
So with those odd URIs, they do have parity with rfc3986 library, which is probably what we want for this refactor
Exactly so, yes.
What do we think. Are we happy enough to pull this PR in as-is, or would we like to make things even tighter with a bunch of additional URL tests?
Okay I'm confident on this one. Anyone up for a review?
Looks like this PR is not possoible to apply on top of currert master
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-httpx-Drop-rfc3986-requirement.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
2 out of 2 hunks FAILED -- saving rejects to file httpx/_models.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_types.py.rej
3 out of 3 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/client/test_proxies.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 1 hunk FAILED -- saving rejects to file setup.py.rej
Any update about integrate this PR and make the new release?
@kloczek Yes, I'd be keen on that. We're currently waiting on an approving review here. I've sent you an invite to the maintainers team, which gives you review permissions. I'd be happy to work on resolving any blockers we have on getting this resolved.
I've sent you an invite to the maintainers team, which gives you review permissions.
Maybe I'm looking in wrong place but I dont see anything like possibility to aprove something (and I did not receive any notyfication about that). 🤔
Just back to httpx after 0.23.1 has been released and looks like this PR needs to be updated again
+ /usr/bin/gzip -dc /home/tkloczko/rpmbuild/SOURCES/python-httpx-0.23.1.tar.gz
+ /usr/bin/tar -xof -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd httpx-0.23.1
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-httpx-Drop-rfc3986-requirement.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
2 out of 2 hunks FAILED -- saving rejects to file httpx/_models.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_types.py.rej
3 out of 3 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/client/test_proxies.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 1 hunk FAILED -- saving rejects to file setup.py.rej
Up to date with master.
Up to date with master.
Are you sure?
[tkloczko@devel-g2v httpx]$ git pull
Already up to date.
[tkloczko@devel-g2v httpx]$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
[tkloczko@devel-g2v httpx]$ wget https://github.com/encode/httpx//pull/2252.patch#/python-httpx-Drop-rfc3986-requirement.patch
--2022-12-09 10:57:02-- https://github.com/encode/httpx//pull/2252.patch
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch [following]
--2022-12-09 10:57:02-- https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch
Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.121.4
Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
Length: unspecified [text/plain]
Saving to: ‘2252.patch’
2252.patch [ <=> ] 105.41K --.-KB/s in 0.05s
2022-12-09 10:57:02 (2.09 MB/s) - ‘2252.patch’ saved [107935]
[tkloczko@devel-g2v httpx]$ patch -p1 < 2252.patch
patching file httpx/_models.py
Reversed (or previously applied) patch detected! Assume -R? [n]
GitHub's happy enough...
We're blocked on an approving review.
paging @encode/maintainers .
@tomchristie you've got it
My only comment is that perhaps we could rename copy_with with replace like how dataclasses do it in stdlib, but this is not something I think should prevent merge.
Great thank you. Let's aim to get a new minor release out first, and then merge this.
Looks like PR is again out of date 😞
Is it anytging else still on outstandung list related to this PR? 🤔
is there any blocker stopping this from getting merged?
Looks ready to go from my side. There's no breaking public API changes here, but since it removes a dependency I'd consider it as committing us to a proper version bump.
I've opened a discussion re. the next release here... https://github.com/encode/httpx/discussions/2534