did-resolution icon indicating copy to clipboard operation
did-resolution copied to clipboard

DID parameters require ASCII-only

Open aphillips opened this issue 3 months ago • 4 comments

DID Parameters https://www.w3.org/TR/did-resolution/#did-parameters

Various parameter's (service, serviceType, versionId, etc.) require that the value be an ASCII string. One parameter (relativeRef) also requires ASCII, but references RFC3986 section 2.1 (percent encoding). It is not clear why the values are restricted to ASCII. If the reason is so that they can traverse e.g. HTTP headers, then escaping should be supplied for non-ASCII values when such values are needed by the application. If there is some other reason, it is not apparent from the specification.

Some of the parameters, such as versionTime have a well-known, required serialization that is, in fact, ASCII only.

service seems to disadvantage systems that would otherwise permit non-ASCII service names.

versionId (and friends like nextVersionId) does not have a defined format. It's unclear why the ASCII restriction is imposed, except for (probably) the way that it simplifies the serialization story. Note that versioning is spelled out here:

  • https://www.w3.org/TR/did-resolution/#versioning

aphillips avatar Sep 22 '25 15:09 aphillips

This was discussed during the #did meeting on 25 September 2025.

View the transcript

DID parameters require ASCII-only #200

<ottomorac> w3c/did-resolution#200

ottomorac: so, this is another one, discussing various parameters here
… requirement that the value be an ASCII string...
… some other references to an RFC

JoeAndrieu: this seems directly entangled in the other one. shouldn't it be UTF8 and not ASCII?

ottomorac: yeah...

manu: yeah, I agree with Joe -- the nuance here is -- we DID say ASCII on purpose originally
… but Addison is right unfortunately, we can't ignore that part of the world, non-US/Latin characters that billions people use, etc
… so yeah, it's entangled w UTF8, so we'll just have to clarify
… like, IF you have non-ASCII character, you MUST use UTF8 encoding. and percent-encode it to ASCII

Wip: (confirming he heard Manu correctly)

Wip: ok, so, somebody just needs to define that process somewhere

manu: yeah, we should just say - this is a statement around encoding/decoding parameters. All parameters need to go through this process
… you start w an input string, ensure it's encoded as UTF8, then you percent-encode THAT value to get to an encoded param
… to decode a parameter, you percent-decode it and arrive at a UTF8 string.

bigbluehat: yeah, there's a couple of RFCs related to this
… that have a fallback to ASCII / related functions

<ottomorac> a couple RFC's have some stuff about this https://datatracker.ietf.org/doc/html/rfc3454 "stringprep"

bigbluehat: and discussion of footguns
… encoding UTF8 URLs has been around for decades etc

ottomorac: ok, yeah, so that's connected to that other issue
… I'll try my hand at it

manu: I agree w bigbluehat that we need to reference an existing RFC, but specifically, we don't want to use Punycoding (that's for domain names only, not URL fragments)

<ottomorac> Ask Addison what should we ref for this


w3cbot avatar Sep 25 '25 19:09 w3cbot

Thank you for the discussion.

There are two ways to look at this.

If the DID being resolved is the logical DID, then a number of these parameter's values are not "ASCII strings" but instead should be defined as (using Infra's definitions) string or scalar value string. The serialization of the DID then involves percent-encoding using UTF-8 as the character encoding, resulting in a wire format that is pure ASCII. The logical DID is then an instance of an IRI (RFC3987) and the "resolvable" DID is its RFC3986 ASCII representation.

The other way to look at this is that the DID being resolved is always and only the serialized URI (RFC3986) form of the "logical" DID. In that case, the various parameters can then only contain (again using Infra's definitions) an ASCII string, with non-URI-safe characters (Unicode code points) represented as a percent-encoding sequence of UTF-8 bytes.

DID Resolution doesn't make a distinction between a logical and serialized form. I'm guessing that your intention is actually the second option above, in which case you should reference RFC3986 section 2.1 with the UTF-8 encoding (or, alternatively, section 3.1 Step 2 of RFC3987). But you need to make clear that the value space for at least some of these parameters permits Unicode code points (appropriately encoded for the wire). Using service as a model, you might say something akin to:

Identifies a service from the DID document by service ID. If present, the associated value MUST be a scalar value string serialized into ASCII according to section 3.1 of [RFC3987].

aphillips avatar Sep 25 '25 20:09 aphillips

Thanks @aphillips I have created a PR using the language you have suggested.

~~Also for @msporny , I believe that once this PR is agreed to and merged, we should probably apply these same changes in the smaller set of DID parameters that are mentioned in DID Core here: https://www.w3.org/TR/did-1.0/#did-parameters Perhaps we should create a mirror issue for this in DID Core?~~ (Updated after Will clarified the DID parameters is dropped in DID Core 1.1)

ottomorac avatar Oct 19 '25 03:10 ottomorac

This was discussed during the #did meeting on 06 November 2025.

View the transcript

w3c/did-resolution#201

<ottomorac> This PR addresses #200, and #201 raised by Addison Phillips, regarding UTF8. It has some relationship to the DID path URL dereferencing, but wondering if we can have it merged before then?

<Wip> w3c/did-resolution#217

wip: I don't have anything about this, but I thought it was issue 217.

ottomorac: I referenced the PRs, not the issue.

<ottomorac> w3c/did-resolution#219

<ottomorac> w3c/did-resolution#215


w3cbot avatar Nov 06 '25 21:11 w3cbot