ocaml-uri
ocaml-uri copied to clipboard
Allow configuration of pct encoding
I am writing a simple client to interact with AWS S3.
Their docs state: This requires a uri encoding by: URI encode every byte. Uri-Encode() must enforce the following rules:
- URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.
- The space character is a reserved character and must be encoded as "%20" (and not as "+").
- Each Uri-encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte.
- Letters in the hexadecimal value must be uppercase, for example "%1A".
- Encode the forward slash character, '/', everywhere except in the object key name. For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded.
As discussed in other issues on this project, the standards, and the reality with this sort of thing are very different. I think the answer is to give the end user a few functions to exude more control. This will allow for standards compliant apis, and arbitrarily not-standards-compliant apis to be used.
Thoughts? Thanks! Trevor
+1
IMO, AWS is important enough for Uri to help out users trying to interface with it even if it will cost some complexity.
It looks like we would need some way to control the literalness of encoding particularly for the unencoded set and '/'. The '/' issue is well-known but this is the first time I've seen a requirement about the unencoded set. Support for this should be included.
@dsheets Would it be possible to tackle this before 2.0 as it is a pretty annoying blocker that you can't get around (if you're using cohttp as your client).
I'm willing to give this a try as well if you give me some directions. From the brief look I've had, it would be possible to just add a query_scheme parameter in Uri.t and then it's just a matter of implementing an AWS compatible Scheme module that handles safe chars for ``Query | Query_key | Query_value`. Am I on the right page?
I'm also hitting problems with this, also for working with S3. FWIW, I'd be quite happy with more types that represent different encodings. A string isn't nearly specific enough.
The specific problem I'm currently having is to encode slashes in query values. The issue seems to be that \ is not encoded:
# Uri.make ~query:["foo", ["ba/r"]] () |> Uri.to_string;;
- : string = "?foo=ba/r"
Trying to work around this by encoding it myself doesn't work because % is encoded:
# Uri.make ~query:["foo", ["ba%2Fr"]] () |> Uri.to_string;;
- : string = "?foo=ba%252Fr"
The output I want is "?foo=ba%2Fr".
This would be useful when working with a library like zeromq as well. If you want to listen to connections from any outside host the host should be * which ends up escaped:
# Uri.make ~scheme:"tcp" ~host:"*" ~port:5555 ()
- : Uri.t = tcp://%2A:5555
Sorry for resurrecting the dead issue. But this library does not seems to be compatible with RFC Compare with python query encoding:
>>> urllib.parse.urlencode({'foo':'web/games-applications?title=&sort_bef_combine=+&sort_order=&sort_by=&page=36'})
'foo=web%2Fgames-applications%3Ftitle%3D%26sort_bef_combine%3D%2B%26sort_order%3D%26sort_by%3D%26page%3D36'
and Uri
utop # Uri.make ~query:["foo", [Uri.pct_decode "web/games-applications?title=&sort_bef_combine=+&sort_order=&sort_by=&page=36"]] () |> Uri.to_string;;
- : string =
"?foo=web/games-applications?title=%26sort_bef_combine=%2B%26sort_order=%26sort_by=%26page=36"
@avsm any thoughts on this? We are forced to vendor ocaml-uri and adjust encoding for our business needs, which is something we try to avoid if at all possible. I'm wondering how many other vendored versions exist in the wild and how much effort is wasted on constant rebasing with upstream. Even a simple (and may be ugly) way to customize encoding would be very welcome by the users.
@avsm is someone actively working on a fix for this? I'm using this in ocaml-aws.
@Lupus Are you able to share your vendored copy of ocaml-uri publicly? I'd be interested in what changes you needed to get it working?
No one is working on this issue at the moment; I’m happy to review a PR
Understood, I'll start reading the code ;-)
Much appreciated. It might be easier done over #142 which I’d like to merge soon
@tmcgilchrist we ended up working around this in other services so that URI encodings are compatible (we had hand-rolled encodings there anyways), that was easier than maintaining an internal fork.
https://github.com/mirage/ocaml-uri/pull/147 looks to cover angstrom support from https://github.com/mirage/ocaml-uri/pull/142 and PCT encoding. I'm in the process of testing that for ocaml-aws.