racket-http-easy
racket-http-easy copied to clipboard
%2A is force decoded to *
A particularly picky website requires me to send a GET request with %2A, however the http-easy internals force this to be decoded to * via the (->url urlish) and (url-path&query u params*) round-trip. The website does not accept * and sends a 301 redirect with * replaced with %2A. This means http-easy and the website are now in an infinite loop where http-easy changes to * and the website asks to change it back to %2A. It only leaves the loop due to #:max-redirects.
I think that if no #:params are required then http-easy should keep the provided URL without doing the round-trip conversion, to make sure each byte stays exactly the same as provided.
Since this is such an edge-case, if it is difficult for you to implement then I would be happy with suggestions for a workaround. I tried looking through your code to see if I could come up with a PR, but I figured you would know your own utility functions better than I do.
Does it work if you encode the %? Eg. example.com?param=%252A?
Yeah, that actually does work, haha! I will see if I can write this workaround into my code
Digging in a little bit, this appears to be a bug in net/uri-codec (unless I'm missing something):
> (require net/uri-codec)
> (alist->form-urlencoded '((param . "*")))
"param=*"
> (alist->form-urlencoded '((param . "%")))
"param=%25"
RFC3986 states that * is a reserved char so it should be encoded.
* is not in the x-www-form-urlencoded set, so it doesn't need to be encoded in query strings according to the spec. Of course, server behaviour may differ. https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set
Yes, you're right. I was confused because I was comparing alist->form-urlencoded to Python's urlencode, which does encode *. Re. the overall issue, probably it would be better not to round-trip redirect URLs, as you suggest, but that seems like it'll be a bit painful. I'll look into it more later this week.
If I can get %252A working in my real code, there's probably no need for you to work on this - this is such an edge case of an edge case!
A related issue is that %20 is also force-decoded to + when it appears after ? in the URL (i.e. it is part of the query parameters). While the URL spec says you're supposed to encode with +, there's a shocking number of websites out there that rely on %20 instead.
The %2520 trick doesn't work in this case, because it remains at %2520. I cannot find a workaround that would allow me to send %20 directly.