multistream-select icon indicating copy to clipboard operation
multistream-select copied to clipboard

String representation character encoding

Open ntninja opened this issue 5 years ago • 3 comments

Just a short question here: What is the character encoding of the protocol strings? I realize that the spec only talks about bytes, but implementations expose it as strings for convenience and therefor need to convert the values. Apparently both the Go implementation and the JS implementation treat it as UTF-8, if I deciphered their code correctly. So is it really UTF-8 or is it ASCII, but using a UTF-8 conversion function was just more convenient? Not a biggy, but this should be specified nonetheless.

ntninja avatar Mar 05 '19 15:03 ntninja

In this spec, it's "just bytes". However, I you're right. We need to specify that strings should be encoded as UTF-8.

Stebalien avatar Mar 15 '19 18:03 Stebalien

~~@Stebalien: What about domain names, they can contain Unicode (IDNA)?~~

~~In py-multiaddr I added support for domain names, by having be full Unicode in text form (since the strings are always Unicode), and IDNA2008/PunnyCode-encoded ASCII in binary form. Does that sound about right to you?~~

ntninja avatar Mar 16 '19 18:03 ntninja

Oh, sorry! I mixed up MSS with multiaddr there!

Please ignore my last comment!

ntninja avatar Mar 17 '19 23:03 ntninja