multiaddr
multiaddr copied to clipboard
define how to handle `/` in component values when represented as a string
When a component value has a / in it, it becomes hard to round-trip a multiaddr from bytes to a string and back again, since it's unclear where the value ends and the next protocol component starts.
A growing list of where this is relevant:
/http-pathhas solved this problem by url-encoding the value (see https://github.com/multiformats/multiaddr/pull/164)./unixneeds a similar solution: https://github.com/multiformats/multiaddr/pull/174- Potentially adding libp2p protocols to multiaddrs has come up in https://github.com/multiformats/multicodec/pull/380 (yes, wrong repo) which are full of
/characters. - In the scope of https://github.com/multiformats/multiaddr/issues/155 - where we might parse a multiaddr from a string that contains protocol components we don't understand, for example
/ip4/123.123.123.123/my-new-protocol/herp/derp. The unknown protocol would be easier to ignore if it was encountered as/ip4/123.123.123.123/my-new-protocol/herp%2Fderp(though obviously it can't be round-tripped as we don't know what the protocol code formy-new-protocolis).
Perhaps it's time to specify how to handle this character properly?
Represent it in the string version of a multiaddr as %2F?
cc @lidel @aschmahmann @MarcoPolo @sukunrt @wemeetagain @pacrob @jxs @elenaf9
In the scope of https://github.com/multiformats/multiaddr/issues/155 - where we might parse a multiaddr from a string that contains protocol components we don't understand, for example /ip4/123.123.123.123/my-new-protocol/herp/derp. The unknown protocol would be easier to ignore if it was encountered as /ip4/123.123.123.123/my-new-protocol/herp%2Fderp (though obviously it can't be round-tripped as we don't know what the protocol code for my-new-protocol is).
To be clear, is this in the case that we are parsing a multiaddr string as opposed to a binary representation?
In either case, when you encounter a protocol you don't understand the only thing you can do is store the unknown code and the rest of the multiaddr
What we have done in the past is allow each protocol to define its string representation. That has allowed us to side step the holy war of deciding which way to escape a / is best. If we want to create a spec that takes the boilerplate of the percent encoded representation and make it easy to use a default, that seems fine to me and hopefully uncontroversial. I imagine saying something like "all future protocols must use percent encoding" is more controversial.
In either case, when you encounter a protocol you don't understand the only thing you can do is store the unknown code and the rest of the multiaddr
Yeah, I think I overlooked the case of unknown protocols without an address part 🙄. Basically since we don't know if the unknown protocol has an address part or not we can't do anything once we encounter an unknown protocol.
If we want to create a spec that takes the boilerplate of the percent encoded representation and make it easy to use a default, that seems fine to me and hopefully uncontroversial.
Agreed, maybe the best we can do is try to establish a convention. That way we least have a recommendation for how to handle / in the future.
I'll open a PR if this gets a few 👍