ocaml-uri icon indicating copy to clipboard operation
ocaml-uri copied to clipboard

Allow configuration of pct encoding

Open trevorsummerssmith opened this issue 10 years ago • 13 comments

I am writing a simple client to interact with AWS S3.

Their docs state: This requires a uri encoding by: URI encode every byte. Uri-Encode() must enforce the following rules:

  • URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.
  • The space character is a reserved character and must be encoded as "%20" (and not as "+").
  • Each Uri-encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte.
  • Letters in the hexadecimal value must be uppercase, for example "%1A".
  • Encode the forward slash character, '/', everywhere except in the object key name. For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded.

As discussed in other issues on this project, the standards, and the reality with this sort of thing are very different. I think the answer is to give the end user a few functions to exude more control. This will allow for standards compliant apis, and arbitrarily not-standards-compliant apis to be used.

Thoughts? Thanks! Trevor

trevorsummerssmith avatar Mar 29 '15 16:03 trevorsummerssmith

+1

IMO, AWS is important enough for Uri to help out users trying to interface with it even if it will cost some complexity.

rgrinberg avatar Mar 29 '15 21:03 rgrinberg

It looks like we would need some way to control the literalness of encoding particularly for the unencoded set and '/'. The '/' issue is well-known but this is the first time I've seen a requirement about the unencoded set. Support for this should be included.

dsheets avatar Apr 01 '15 08:04 dsheets

@dsheets Would it be possible to tackle this before 2.0 as it is a pretty annoying blocker that you can't get around (if you're using cohttp as your client).

I'm willing to give this a try as well if you give me some directions. From the brief look I've had, it would be possible to just add a query_scheme parameter in Uri.t and then it's just a matter of implementing an AWS compatible Scheme module that handles safe chars for ``Query | Query_key | Query_value`. Am I on the right page?

rgrinberg avatar Jul 08 '15 17:07 rgrinberg

I'm also hitting problems with this, also for working with S3. FWIW, I'd be quite happy with more types that represent different encodings. A string isn't nearly specific enough.

The specific problem I'm currently having is to encode slashes in query values. The issue seems to be that \ is not encoded:

# Uri.make ~query:["foo", ["ba/r"]] () |> Uri.to_string;;
- : string = "?foo=ba/r"

Trying to work around this by encoding it myself doesn't work because % is encoded:

# Uri.make ~query:["foo", ["ba%2Fr"]] () |> Uri.to_string;;
- : string = "?foo=ba%252Fr"

The output I want is "?foo=ba%2Fr".

agarwal avatar Sep 24 '15 18:09 agarwal

This would be useful when working with a library like zeromq as well. If you want to listen to connections from any outside host the host should be * which ends up escaped:

# Uri.make ~scheme:"tcp" ~host:"*" ~port:5555 ()
- : Uri.t = tcp://%2A:5555

hcarty avatar Sep 30 '15 18:09 hcarty

Sorry for resurrecting the dead issue. But this library does not seems to be compatible with RFC Compare with python query encoding:

>>> urllib.parse.urlencode({'foo':'web/games-applications?title=&sort_bef_combine=+&sort_order=&sort_by=&page=36'})
'foo=web%2Fgames-applications%3Ftitle%3D%26sort_bef_combine%3D%2B%26sort_order%3D%26sort_by%3D%26page%3D36'

and Uri

utop # Uri.make ~query:["foo", [Uri.pct_decode "web/games-applications?title=&sort_bef_combine=+&sort_order=&sort_by=&page=36"]] () |> Uri.to_string;;
- : string =
"?foo=web/games-applications?title=%26sort_bef_combine=%2B%26sort_order=%26sort_by=%26page=36"

sazarkin avatar Aug 05 '19 10:08 sazarkin

@avsm any thoughts on this? We are forced to vendor ocaml-uri and adjust encoding for our business needs, which is something we try to avoid if at all possible. I'm wondering how many other vendored versions exist in the wild and how much effort is wasted on constant rebasing with upstream. Even a simple (and may be ugly) way to customize encoding would be very welcome by the users.

Lupus avatar Aug 06 '19 10:08 Lupus

@avsm is someone actively working on a fix for this? I'm using this in ocaml-aws.

@Lupus Are you able to share your vendored copy of ocaml-uri publicly? I'd be interested in what changes you needed to get it working?

tmcgilchrist avatar Apr 02 '20 09:04 tmcgilchrist

No one is working on this issue at the moment; I’m happy to review a PR

avsm avatar Apr 02 '20 10:04 avsm

Understood, I'll start reading the code ;-)

tmcgilchrist avatar Apr 02 '20 10:04 tmcgilchrist

Much appreciated. It might be easier done over #142 which I’d like to merge soon

avsm avatar Apr 02 '20 10:04 avsm

@tmcgilchrist we ended up working around this in other services so that URI encodings are compatible (we had hand-rolled encodings there anyways), that was easier than maintaining an internal fork.

Lupus avatar Apr 02 '20 10:04 Lupus

https://github.com/mirage/ocaml-uri/pull/147 looks to cover angstrom support from https://github.com/mirage/ocaml-uri/pull/142 and PCT encoding. I'm in the process of testing that for ocaml-aws.

tmcgilchrist avatar Sep 02 '20 22:09 tmcgilchrist