multiaddr icon indicating copy to clipboard operation
multiaddr copied to clipboard

define how to handle `/` in component values when represented as a string

Open achingbrain opened this issue 6 months ago • 4 comments

When a component value has a / in it, it becomes hard to round-trip a multiaddr from bytes to a string and back again, since it's unclear where the value ends and the next protocol component starts.

A growing list of where this is relevant:

  • /http-path has solved this problem by url-encoding the value (see https://github.com/multiformats/multiaddr/pull/164).
  • /unix needs a similar solution: https://github.com/multiformats/multiaddr/pull/174
  • Potentially adding libp2p protocols to multiaddrs has come up in https://github.com/multiformats/multicodec/pull/380 (yes, wrong repo) which are full of / characters.
  • In the scope of https://github.com/multiformats/multiaddr/issues/155 - where we might parse a multiaddr from a string that contains protocol components we don't understand, for example /ip4/123.123.123.123/my-new-protocol/herp/derp. The unknown protocol would be easier to ignore if it was encountered as /ip4/123.123.123.123/my-new-protocol/herp%2Fderp (though obviously it can't be round-tripped as we don't know what the protocol code for my-new-protocol is).

Perhaps it's time to specify how to handle this character properly?

Represent it in the string version of a multiaddr as %2F?

achingbrain avatar Jun 04 '25 10:06 achingbrain

cc @lidel @aschmahmann @MarcoPolo @sukunrt @wemeetagain @pacrob @jxs @elenaf9

achingbrain avatar Jun 04 '25 10:06 achingbrain

In the scope of https://github.com/multiformats/multiaddr/issues/155 - where we might parse a multiaddr from a string that contains protocol components we don't understand, for example /ip4/123.123.123.123/my-new-protocol/herp/derp. The unknown protocol would be easier to ignore if it was encountered as /ip4/123.123.123.123/my-new-protocol/herp%2Fderp (though obviously it can't be round-tripped as we don't know what the protocol code for my-new-protocol is).

To be clear, is this in the case that we are parsing a multiaddr string as opposed to a binary representation?

In either case, when you encounter a protocol you don't understand the only thing you can do is store the unknown code and the rest of the multiaddr

MarcoPolo avatar Jun 04 '25 19:06 MarcoPolo

What we have done in the past is allow each protocol to define its string representation. That has allowed us to side step the holy war of deciding which way to escape a / is best. If we want to create a spec that takes the boilerplate of the percent encoded representation and make it easy to use a default, that seems fine to me and hopefully uncontroversial. I imagine saying something like "all future protocols must use percent encoding" is more controversial.

MarcoPolo avatar Jun 04 '25 20:06 MarcoPolo

In either case, when you encounter a protocol you don't understand the only thing you can do is store the unknown code and the rest of the multiaddr

Yeah, I think I overlooked the case of unknown protocols without an address part 🙄. Basically since we don't know if the unknown protocol has an address part or not we can't do anything once we encounter an unknown protocol.

If we want to create a spec that takes the boilerplate of the percent encoded representation and make it easy to use a default, that seems fine to me and hopefully uncontroversial.

Agreed, maybe the best we can do is try to establish a convention. That way we least have a recommendation for how to handle / in the future.

I'll open a PR if this gets a few 👍

achingbrain avatar Jun 05 '25 13:06 achingbrain