httpx
httpx copied to clipboard
Support passing bytes keys/values in `params=...`
>>> httpx.get('https://example.org', params={b'testparam': b'testvalue'}).url
URL('https://example.org?b%27testparam%27=b%27testvalue%27')
I don't think this is intentional, as there is an isinstance check for bytes in the flatten_queryparams function, I'd expect anyone involved to have raised an exception instead of letting it go through like that, and that would be an undocumented divergence from requests compatibility.
I have tested with httpx==0.7.8 as well as httpx==0.11.0.
Hmm, yeah, I believe that should be breaking or converting the types to one of the PrimitiveTypes (str, int, float, bool), but typing.cast() doesn't actually do any casting, it just lets the typechecker know that's what it should be[1]. That function could be modified something like:
for k, v in queryparams.items():
if isinstance(v, collections.abc.Sequence) and not isinstance(v, (str, bytes)):
for u in v:
if not isinstance(u, (str, int, float, bool)):
raise ValueError
items.append((k, u))
else:
if not isinstance(v, (str, int, float, bool)):
raise ValueError
items.append((str(k), typing.cast("PrimitiveData", v)))
but I'm sure one of the main devs will have a better idea.
[1] https://docs.python.org/3/library/typing.html#typing.cast
Huh yeah, so the typing for our query params isβ¦
PrimitiveData = typing.Optional[typing.Union[str, int, float, bool]]
QueryParamTypes = typing.Union[
"QueryParams",
typing.Mapping[str, typing.Union[PrimitiveData, typing.Sequence[PrimitiveData]]],
typing.List[typing.Tuple[str, PrimitiveData]],
str,
]
(with the Mapping line being the one at play here.)
So we don't support bytes neither for keys nor values currently. You're right that using bytes should result in an exception. I thought we already had an issue in the past about bytes support for query params, but we don't, so thanks for opening this!
This piece of code shows that Requests does indeed support query params as strings or bytes, both on keys and values. So I think we ~should~ could consider it? π€ @tomchristie
But I guess the real question is: do we want to expose the fact that we accept str or bytes across the QueryParams model? Or should it coerce everything as strings (or bytes?) internally, and then always return strings (or bytes?) when queried? Should bytes-coerced-as-strings be returned when a string key is queried, or sould that error out?
from httpx.models import QueryParams
params = QueryParams({b"key": b"value"})
# What would be the behavior of the following?
params.get(b"key")
params.get("key")
It's very possible that supporting bytes would bring us much more hassle and fuzziness than raising a clean exception in the likes of "bytes aren't supported, please pass a string or .decode() first". (In fact as I've just looked at it, and properly dealing with union types really is a pain.)
I think requests supporting bytes is probably due to backward conpat with their years of Python 2. However, HTTPX does support bytes for headers. Either way, HTTPX should probably be consistent between them?
@thebigmunch Do you have to pass bytes here, or can you get away with calling .decode() on the bytes query params keys/values?
Also:
HTTPX does support bytes for headers. Either way, HTTPX should probably be consistent between them?
True, and all headers are even encoded as bytes internally. But that's because they have to eventually be sent as bytes through the network. Query parameters are different: they will end up being passed through urllib.parse.urlencode(), which returns a string in all cases. So there's no inconsistency here β headers and query params just aren't meant to be used in the same way.
True, and all headers are even encoded as bytes internally. But that's because they have to eventually be sent as bytes through the network. Query parameters are different: they will end up being passed through
urllib.parse.urlencode(), which returns a string in all cases. So there's no inconsistency here β headers and query params just aren't meant to be used in the same way.
You're talking about implementation details. For example, in Python 2, urllib.urlencode would return a byte string, not unicode. So, its return type is not due to the way query params are meant to be used but rather the Python language decision of what string type to use by default. Even passing unicode in Python 2 would result in a byte. And passing a unicode string not covered by ASCII would result in a UnicodeEncodeError. It's important to remember that these are just string representations. And what something ends up as doesn't necessarily dictate that's what it needs to be given as (HTTPX accepts str for header keys/values after all).
There is an API inconsistency that may confuse people and will surely hit someone again coming from requests.
@thebigmunch Do you have to pass bytes here, or can you get away with calling
.decode()on the bytes query params keys/values?
No, in fact I've already changed the code that caused me to catch this issue to do just that. It just looked like this was something that wasn't decided on one way or another. And it hadn't been brought up. And there are actions that need taking should it to be made a design decision.
I think accepting bytes but doing some guess work casting anywhere we accept str would be a bit of a can of worms.
httpx is a Python 3 only library and I don't think it's unreasonable to expect users to handle the encoding/decoding of bytes.
That being said there's a spectrum of how defensive we can be about these situations. On the one side we can slap a big ol' warning on the docs stating that passing bytes instead of strings is not supported. On the other side we can add a util function str_or_die and use it everywhere, which I think it's overkill.
To me it's quite common to pass the wrong type of parameter to a function call and have it behave incorrectly in Python. That's one of the reasons we type annotate everything, so users can have mypy check these inconsistencies before they run the code.
I understand @thebigmunch's argument that it's an inconsistency with requests but that's one I'd be fine with including in the compatibility guide.
Given that URLs are ultimately in bytewise form, I think it'd be pretty sensible for us to accept bytes for the params.
I think we could also just get a bit looser on the PrimitiveTypes, and switch to Any instead, using using str(value) throughout as the fall-back. Having done a bit of a review of our type annotations, I'm thinking that having a lighter, more obvious signature would probably be beneficial, even if it means that we're just being a bit more relaxed about the typing.
Labelling this up as an "enhancement" since we're meeting our own contract for this properly at the moment, but that it'd probably be an improvement for us to tweak that up a bit.
Related... we also ought to accept bytes on URL() (Which will afterall ultimately end up in bytewise representations), and use the AnyStr type annotation rather than any places we're currently using Union[str, bytes].
Given that URLs are ultimately in bytewise form, I think it'd be pretty sensible for us to accept
bytesfor the params.I think we could also just get a bit looser on the
PrimitiveTypes, and switch toAnyinstead, using usingstr(value)throughout as the fall-back. Having done a bit of a review of our type annotations, I'm thinking that having a lighter, more obvious signature would probably be beneficial, even if it means that we're just being a bit more relaxed about the typing.
Related... we also ought to accept
bytesonURL()(Which will afterall ultimately end up in bytewise representations), and use theAnyStrtype annotation rather than any places we're currently usingUnion[str, bytes].
If these are not being worked on right now, may I work on it? I am assuming these 2 are the changes that I need to implement.
I understand @thebigmunch's argument that it's an inconsistency with
requestsbut that's one I'd be fine with including in the compatibility guide.
And should I also include this in the docs after the implementation to add clarity?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still valid thx, @stale bot. Although I think we should probably raise a TypeError, rather than supporting it.
It would be great to have this feature. Let's say some webapp accepts a file header in a GET parameter and must return the corresponding mime type.
I would try:
httpx.get("http://192.168.56.1:8000/get", params=[["header", b"\x89\x50\x4e\x47"]])
httpx sends header=b%27%5Cx89PNG%27 (repr of the value)
Then using httpx.get("http://192.168.56.1:8000/get", params=[["header", "%89%50%4e%47"]])
I get (without real surprise to be honest) double encoding (header=%2589%2550%254e%2547)
Passing the whole string with httpx.get("http://192.168.56.1:8000/get", params="header=%89%50%4e%47")
the encoding is changed : header=%EF%BF%BDPNG
Finally giving the whole data as bytes with httpx.get("http://192.168.56.1:8000/get", params=b"header=\x89\x50\x4e\x47") raises an exception (UnicodeDecodeError: 'ascii' codec can't decode byte 0x89 in position 7: ordinal not in range(128))
Also as far as I know (browsers do it that way) the encoding applied to GET/POST parameters (keys and values) is the one declared in the webpage (meta tag or http headers of the server) so it can be everything.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.