httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Support passing bytes keys/values in `params=...`

Open thebigmunch opened this issue 5 years ago β€’ 16 comments
trafficstars

>>> httpx.get('https://example.org', params={b'testparam': b'testvalue'}).url
URL('https://example.org?b%27testparam%27=b%27testvalue%27')

I don't think this is intentional, as there is an isinstance check for bytes in the flatten_queryparams function, I'd expect anyone involved to have raised an exception instead of letting it go through like that, and that would be an undocumented divergence from requests compatibility.

I have tested with httpx==0.7.8 as well as httpx==0.11.0.

thebigmunch avatar Jan 10 '20 00:01 thebigmunch

Hmm, yeah, I believe that should be breaking or converting the types to one of the PrimitiveTypes (str, int, float, bool), but typing.cast() doesn't actually do any casting, it just lets the typechecker know that's what it should be[1]. That function could be modified something like:

    for k, v in queryparams.items():
        if isinstance(v, collections.abc.Sequence) and not isinstance(v, (str, bytes)):
            for u in v:
                if not isinstance(u, (str, int, float, bool)):
                    raise ValueError
                items.append((k, u))
        else:
            if not isinstance(v, (str, int, float, bool)):
                raise ValueError
            items.append((str(k), typing.cast("PrimitiveData", v)))

but I'm sure one of the main devs will have a better idea.

[1] https://docs.python.org/3/library/typing.html#typing.cast

StephenBrown2 avatar Jan 10 '20 00:01 StephenBrown2

Huh yeah, so the typing for our query params is…

PrimitiveData = typing.Optional[typing.Union[str, int, float, bool]]

QueryParamTypes = typing.Union[
    "QueryParams",
    typing.Mapping[str, typing.Union[PrimitiveData, typing.Sequence[PrimitiveData]]],
    typing.List[typing.Tuple[str, PrimitiveData]],
    str,
]

(with the Mapping line being the one at play here.)

So we don't support bytes neither for keys nor values currently. You're right that using bytes should result in an exception. I thought we already had an issue in the past about bytes support for query params, but we don't, so thanks for opening this!

This piece of code shows that Requests does indeed support query params as strings or bytes, both on keys and values. So I think we ~should~ could consider it? πŸ€” @tomchristie

florimondmanca avatar Jan 10 '20 07:01 florimondmanca

But I guess the real question is: do we want to expose the fact that we accept str or bytes across the QueryParams model? Or should it coerce everything as strings (or bytes?) internally, and then always return strings (or bytes?) when queried? Should bytes-coerced-as-strings be returned when a string key is queried, or sould that error out?

from httpx.models import QueryParams

params = QueryParams({b"key": b"value"})
# What would be the behavior of the following?
params.get(b"key")
params.get("key")

It's very possible that supporting bytes would bring us much more hassle and fuzziness than raising a clean exception in the likes of "bytes aren't supported, please pass a string or .decode() first". (In fact as I've just looked at it, and properly dealing with union types really is a pain.)

florimondmanca avatar Jan 10 '20 08:01 florimondmanca

I think requests supporting bytes is probably due to backward conpat with their years of Python 2. However, HTTPX does support bytes for headers. Either way, HTTPX should probably be consistent between them?

thebigmunch avatar Jan 10 '20 08:01 thebigmunch

@thebigmunch Do you have to pass bytes here, or can you get away with calling .decode() on the bytes query params keys/values?

florimondmanca avatar Jan 11 '20 18:01 florimondmanca

Also:

HTTPX does support bytes for headers. Either way, HTTPX should probably be consistent between them?

True, and all headers are even encoded as bytes internally. But that's because they have to eventually be sent as bytes through the network. Query parameters are different: they will end up being passed through urllib.parse.urlencode(), which returns a string in all cases. So there's no inconsistency here β€” headers and query params just aren't meant to be used in the same way.

florimondmanca avatar Jan 11 '20 18:01 florimondmanca

True, and all headers are even encoded as bytes internally. But that's because they have to eventually be sent as bytes through the network. Query parameters are different: they will end up being passed through urllib.parse.urlencode(), which returns a string in all cases. So there's no inconsistency here β€” headers and query params just aren't meant to be used in the same way.

You're talking about implementation details. For example, in Python 2, urllib.urlencode would return a byte string, not unicode. So, its return type is not due to the way query params are meant to be used but rather the Python language decision of what string type to use by default. Even passing unicode in Python 2 would result in a byte. And passing a unicode string not covered by ASCII would result in a UnicodeEncodeError. It's important to remember that these are just string representations. And what something ends up as doesn't necessarily dictate that's what it needs to be given as (HTTPX accepts str for header keys/values after all).

There is an API inconsistency that may confuse people and will surely hit someone again coming from requests.

@thebigmunch Do you have to pass bytes here, or can you get away with calling .decode() on the bytes query params keys/values?

No, in fact I've already changed the code that caused me to catch this issue to do just that. It just looked like this was something that wasn't decided on one way or another. And it hadn't been brought up. And there are actions that need taking should it to be made a design decision.

thebigmunch avatar Jan 11 '20 19:01 thebigmunch

I think accepting bytes but doing some guess work casting anywhere we accept str would be a bit of a can of worms.

httpx is a Python 3 only library and I don't think it's unreasonable to expect users to handle the encoding/decoding of bytes.

That being said there's a spectrum of how defensive we can be about these situations. On the one side we can slap a big ol' warning on the docs stating that passing bytes instead of strings is not supported. On the other side we can add a util function str_or_die and use it everywhere, which I think it's overkill.

To me it's quite common to pass the wrong type of parameter to a function call and have it behave incorrectly in Python. That's one of the reasons we type annotate everything, so users can have mypy check these inconsistencies before they run the code.

I understand @thebigmunch's argument that it's an inconsistency with requests but that's one I'd be fine with including in the compatibility guide.

yeraydiazdiaz avatar Jan 20 '20 10:01 yeraydiazdiaz

Given that URLs are ultimately in bytewise form, I think it'd be pretty sensible for us to accept bytes for the params.

I think we could also just get a bit looser on the PrimitiveTypes, and switch to Any instead, using using str(value) throughout as the fall-back. Having done a bit of a review of our type annotations, I'm thinking that having a lighter, more obvious signature would probably be beneficial, even if it means that we're just being a bit more relaxed about the typing.

lovelydinosaur avatar Jan 22 '20 12:01 lovelydinosaur

Labelling this up as an "enhancement" since we're meeting our own contract for this properly at the moment, but that it'd probably be an improvement for us to tweak that up a bit.

lovelydinosaur avatar Jan 22 '20 12:01 lovelydinosaur

Related... we also ought to accept bytes on URL() (Which will afterall ultimately end up in bytewise representations), and use the AnyStr type annotation rather than any places we're currently using Union[str, bytes].

lovelydinosaur avatar Jan 22 '20 12:01 lovelydinosaur

Given that URLs are ultimately in bytewise form, I think it'd be pretty sensible for us to accept bytes for the params.

I think we could also just get a bit looser on the PrimitiveTypes, and switch to Any instead, using using str(value) throughout as the fall-back. Having done a bit of a review of our type annotations, I'm thinking that having a lighter, more obvious signature would probably be beneficial, even if it means that we're just being a bit more relaxed about the typing.

Related... we also ought to accept bytes on URL() (Which will afterall ultimately end up in bytewise representations), and use the AnyStr type annotation rather than any places we're currently using Union[str, bytes].

If these are not being worked on right now, may I work on it? I am assuming these 2 are the changes that I need to implement.

I understand @thebigmunch's argument that it's an inconsistency with requests but that's one I'd be fine with including in the compatibility guide.

And should I also include this in the docs after the implementation to add clarity?

cwkang1998 avatar Sep 27 '21 16:09 cwkang1998

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 20 '22 15:02 stale[bot]

Still valid thx, @stale bot. Although I think we should probably raise a TypeError, rather than supporting it.

lovelydinosaur avatar Feb 21 '22 13:02 lovelydinosaur

It would be great to have this feature. Let's say some webapp accepts a file header in a GET parameter and must return the corresponding mime type.

I would try:

httpx.get("http://192.168.56.1:8000/get", params=[["header", b"\x89\x50\x4e\x47"]]) httpx sends header=b%27%5Cx89PNG%27 (repr of the value)

Then using httpx.get("http://192.168.56.1:8000/get", params=[["header", "%89%50%4e%47"]]) I get (without real surprise to be honest) double encoding (header=%2589%2550%254e%2547)

Passing the whole string with httpx.get("http://192.168.56.1:8000/get", params="header=%89%50%4e%47") the encoding is changed : header=%EF%BF%BDPNG

Finally giving the whole data as bytes with httpx.get("http://192.168.56.1:8000/get", params=b"header=\x89\x50\x4e\x47") raises an exception (UnicodeDecodeError: 'ascii' codec can't decode byte 0x89 in position 7: ordinal not in range(128))

Also as far as I know (browsers do it that way) the encoding applied to GET/POST parameters (keys and values) is the one declared in the webpage (meta tag or http headers of the server) so it can be everything.

devl00p avatar Mar 02 '22 21:03 devl00p

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 09 '22 01:09 stale[bot]