did-core icon indicating copy to clipboard operation
did-core copied to clipboard

DID needs a proper vocabulary specification

Open iherman opened this issue 9 months ago • 8 comments

(This issue is relevant for DID, but it may also be relevant to DID Resolution, see https://github.com/w3c/did-resolution/issues/137. Also, my apologies for being a bit verbose in what follows, but some participants of this group were not part of the VC group where this was discussed a while ago.)

DID Core uses JSON-LD. Most of the terms used in DID Core, like controller are formally defined in the CID spec but DID adds a few DID specific terms to the CID terms.

Because there are new terms, that means DID specifies its own vocabulary in the RDF sense. The proper, good practice when defining an RDF vocabulary is to make a machine-readable representation of the vocabulary terms, that specifies a minimal ontological context for the terms (what are the possible values, if such restrictions exist, which are the classes and properties, etc.). This is not the @context file, which just defines a transformation engine form JSON terms (names or values) to their corresponding vocabulary items represented by the URLs. These vocabulary URLs are part of the aforementioned vocabulary representation.

We went through all that in the VC WG. See, fore example, the CID context file (https://www.w3.org/ns/cid/v1) transforms the controller term into its proper vocabulary term, namely https://w3id.org/security#controller. What this URL really 'means' in a machine-readable form is part of the security vocabulary: by default, this is in an HTML+RDFa file, but it also has a representation in JSON-LD or Turtle for Linked Data applications. (For backward compatibility reasons, the CID vocabulary items are part of the Data Integrity vocabulary.) It is the combination of the context file, the machine-readable version of the formal vocabulary (serialized in HTML+RDFa, JSON-LD, and Turtle), and the formal, English specification of the terms that make the picture complete.

We do not have that for DID. In my view, we should. I.e., we should create the vocabulary files and put a proper reference to it into the main, core spec. Actually, (and this is where this links to https://github.com/w3c/did-resolution/issues/137) we should decide whether we do it once, incorporating the DID Resolution terms, or we do the exercise twice: one for DID Core and one for DID Resolution. Note that, as I already said elsewhere, the DID vocabulary is extremely simple, it only defines (as of now) two, service related terms so doing it twice might be an overkill.

The good news that we have now the tools to do this properly, used and deployed in the VC WG. We only have to define the main characteristics of the vocabulary terms in a YAML files, and switch the machinery on (see, for example, the YAML version of the security vocabulary). That file is fairly long because of the size of that vocabulary; it will be a fraction of that for DID). And, as an editor of the VC vocabularies, I am of course happy to do that for the DID WG as well when the time comes.

iherman avatar Mar 21 '25 08:03 iherman

To show what I am talking about, I generated a vocabulary file (using the latest version of yml2vocab, not yet fully deployed) in: https://w3c.github.io/yml2vocab/previews/did/.

iherman avatar Mar 25 '25 10:03 iherman

Pinging @msporny @dlongley @peacekeeper

iherman avatar Mar 25 '25 10:03 iherman

This was discussed during the #did meeting on 08 May 2025.

View the transcript

884 - DID needs a proper vocabulary specification

<ottomorac> w3c/did#884

ottomorac: We need a vocabulary for the DID spec, we are currently kind of borrowing it from the CID spec. The issue is connected to 137 which discusses if we should JSON or JSON-LD for DID resolution. Probably a vocab is required for both. Then Ivan generated a sample vocab file.

ivan: I will have to see with manu how exactly we put it in the repository. The rest I presume is the same as we did over there. These are things that can be done easily.

ivan: There was a discussion that we had on the DID Resolution side. It seems we have converged to the opinion that the DID Resolution can work without JSON-LD.

ivan: If DID Resolution does not use JSON-LD, then there are less dependencies.

ivan: This is w3c/did-resolution#137 in DID Resolution about using JSON-LD

manu: ivan what you're saying we need to record the decision if DID Resolution will use JSON-LD or not.

manu: Then the only thing we need to discuss is the DID vocabulary.

manu: My recollection is that last time we talked about it, there were 1 or 2 terms we needed to define (service, serviceEndpoint)

ivan: That's right

manu: This makes it easier. We need a vocabulary and it will have these two terms in it. You and I can put this together.

Markus: Yes I have questions did resolution and JSON-LD.... but after this discussion I am fine that we dont need JSON LD resolution...

markus: when we say vocab, which are we talking about?

manu: markus_sabadello We would be generating a human-readable document that is the vocabulary, and it will have machine-readable equivalent to it. We would generate a JSON-LD file. We have done this before for VC data model, Bitstring status list, etc.

manu: We are having a discussion about requiring JSON-LD in DID resolution. Thank you markus_sabadello for the update on your opinion

manu: I'd like to ask if we want to make it optional. Every time we have the discussion, we want to make two communities happy about it. If we don't create a DID Resolution context, it would mean you couldn't use JSON-LD at all.

manu: So we could still give people the option to do JSON-LD processing. The downside is it would be a significant amount of work for us

manu: The strongest argument I see is to be able to sign a response, i.e. a trusted response.

manu: Counterargument is that you could still use signed responses using JCS

markus: Yes... I would be fine with either solution... making it optional also makes sense....

markus: I would maybe also add that we have seen some innovation when it comes to metadata and things that are added for did resolution metadata, such as the did linked resources mentioned earlier in this call by tweeddalex

ivan: Let's not get into a long philosophical debate.

ivan: Your argument that you could sign it, you also countered the same argument.

ivan: The reason why someone uses JSON-LD is because you want to take that piece of data, and you want to combine it with data from elsewhere, for a "greater purpose". You can pull together vocabularies from different places.

ivan: When we come to complex verifiable credentials, using JSON-LD for VCs makes a lot of sense. That's where JSON-LD is useful.

ivan: As long as it's a closed thing which is used in a focused way - which I think is the case with DID Resolution - then I don't see any reason why JSON-LD is useful.

ivan: Do we intend to mix DID Resolution Responses with other data? If the response is no, then I don't see any argument for using JSON-LD. Even using it optionally blurs the picture for me.

manu: I agree with that line of reasoning. To offer something that makes this easier, we can also start with a JSON-only approach and always add JSON-LD later.

manu: Adding extensions would be done in a "JSON-friendly" way.

<ivan> +1 to Manu on later

manu: We can start with no vocabulary or context for DID Resolution. And we can always add it later.

markus: yes fine with that.. would like to see some guidance on the media type....

manu: Would suggest application/did-resolution

manu: We have experienced that using + es can be complicated

manu: This will require a PR to make the change


w3cbot avatar May 08 '25 16:05 w3cbot

Actually... it seems that all of us are amnesic...

There actually is a vocabulary specification, served from several hops of redirections: /ns/did/->/ns/did-vocab/->w3c.github.io/did-extensions/vocabs/v1/. The directory contains the ttl, jsonld, etc., files. I must admit I completely forgot about those, although some files bear my name; I suspect they were created by Amy Guy, back in the day of DID 1.0.

I will review those files; if they are o.k. content wise (or need some minor updates) I think we should consider the issue closed.

Note that there are also shacl an shex files for RDF based structural checks of DID documents. It seems that nobody used them, otherwise someone would have come forward during this discussion...

iherman avatar May 20 '25 06:05 iherman

I will review those files; if they are o.k. content wise (or need some minor updates) I think we should consider the issue closed.

Well… I do not think they are o.k. A number of changes should be done. Indeed

  • The references to terms like assertionMethod refer to the DID Core v1.0 specification for its formal definition. In DID v1.1 this is not true anymore; all those methods are formally defined in CID 1.0; DID v1.1 only makes use of those terms and does not define them. All those references are to be changed.
  • The document merely lists the terms, without any further information about range and domains of properties, when applicable and possible. There is room to enrich these.
  • The adjacent v1.0 JSON-LD context file is actually incomplete: it does not mention terms like Multikey, which is explicitly used in the examples. This is a bug in the current version of the spec, more exactly context, too, see Example 13. This is in contrast with, for example, the CID Context file which has all those.
  • In light of all the changes, it is probably not justified to "just" upgrade those files to 1.1; for the sake of history we should keep those files as they are, and create new versions for 1.1.

As an aside, I am also surprised to find these files in the did-extensions repository. How did they end up there? The vocabulary files are an integral part of the spec, imho, so I do not believe that is the right place.

I have therefore decided to go ahead and create a new version of the vocabulary and context files, starting with the same vocabulary description file in yml. I have improved my earlier attempt, and placed all files to the "preview" directory of the generation tool; see the relevant index files for further links.

Why not a PR (yet)? I am not sure where our fearless editors want to put all that. We could

  1. Put all the content of the aforementioned preview to https://github.com/w3c/did-extensions/tree/main/vocabs/v1.1. This is in line with what we did for v1.0. On the other hand, I personally believe that this is the wrong repository for this in the first place.
  2. Put all the content of the to https://github.com/w3c/did/tree/main/vocab/v1.1.
  3. Put the pre-generation files, i.e., the template and the yml file, into https://github.com/w3c/did/tree/main/vocab/v1.1, and we add a GitHub action file like for the VCDM repository which would generate the final files automatically after a merge.

My favorite would be (3). (1) or (2) presupposes that the vocabulary files are generated off-line, which has proven to be a burden in the VC case (people do not necessarily want to download the generation tool). But the choice is not for me to take. @msporny, @wip-abramson, @ottomorac, what say you?

cc @pchampin

iherman avatar May 20 '25 15:05 iherman

A fun fact on the generated vocabulary files. If you look at, say, the turtle file, you will realize that, for the did terms, I used the dd prefix for the curies in Turtle. Why? Because something like, say, did:service would be interpreted by various tools as a URI. Just like one is not supposed to use http or ftp as a prefix in a CURIE, one shouldn't do that with did either…

iherman avatar May 20 '25 15:05 iherman

I have therefore decided to go ahead and create a new version of the vocabulary and context files, starting with the same vocabulary description file in yml. I have improved my earlier attempt, and placed all files to the "preview" directory of the generation tool; see the relevant index files for further links.

Thanks for looking into this @iherman and taking this work on. Much appreciated.

I agree the third option sounds preferable to me. It makes sense to generate these files from a Github action.

wip-abramson avatar May 20 '25 17:05 wip-abramson

WG Discussion on 22-May:

We need a vocabulary for the DID spec, we are currently kind of borrowing it from the CID spec. Group decision was to only use JSON for DID resolution (we can add JSON-LD option later). Also Manu suggested the media type application/did-resolutionIvan found some vocab files, probably from Amy Guy that required some adjustments,and decided to create new ones. The question is now where to locate the vocab files.

Ivan: looked at the files done by Amy, there were quite a lot of changes that should have been done (e.g. CID) and other references. Should keep old ones for the sake of history. Have all the vocabulary files ready, unsure where to put them. Maybe not the DID Spec extension. They are core documents, rather than extensions. 1st proposal DID Core repository? Or, (2nd) in a way that the vocabularies would be automatically generated. Whole PR material is ready Github action file that should be reviewed properly (by Manu if possible) The tool can also generate context files. The files that are generated are better than the current official DID context file (which omits certain terms, making the example in the document invalid). E.g. it doesn't have terms for multi-keys. The question is whether we use the same tool to generate the context file, or generate it manually.

Manu: Agree this shouldn't go into DID Extensions. It should go with the main spec, like we have done for verifiable credentials. Complements the tooling Ivan has created - automatic generation is preferable Thinks we can generate all of it automatically and plus 1 to the idea

Ivan: to raise the PR tomorrow, as its all ready

ottomorac avatar May 22 '25 20:05 ottomorac

PR #892 has been merged, closing.

msporny avatar May 31 '25 16:05 msporny