vc-data-model icon indicating copy to clipboard operation
vc-data-model copied to clipboard

Avoiding confusion by renaming 'credentialSubject'

Open RieksJ opened this issue 5 years ago • 59 comments

In section 4.4, the term 'credentialSubject' is defined, suggesting a relation between the credential in which it is included and a 'Subject', i.e. an entity about which claims are made. However, the credentialSubject section contains a list of claims, rather than a (or more) subject(s).

This issue calls for renaming 'credentialSubject', and provides the following suggestions

  • 'credentialSubjectProperties'
  • 'claims'

RieksJ avatar Mar 27 '19 06:03 RieksJ

+1 to claims

brentzundel avatar Mar 27 '19 14:03 brentzundel

+1 to claims

awoie avatar Apr 01 '19 09:04 awoie

+1 to claims

ken-ebert avatar Apr 01 '19 18:04 ken-ebert

+1 to claims

jandrieu avatar Apr 01 '19 20:04 jandrieu

We have been round this one many times, and the reason it was changed from claims is that the ID is not the iD of the claim, but the ID of the subject. So the object needs to have subject somewhere in its name.

David-Chadwick avatar Apr 01 '19 20:04 David-Chadwick

@David-Chadwick, I agree that it makes sense to those of in the group to call it the credentialSubject for the reasons you mentioned. This doesn't change the fact that it is confusing for others. We say that a verifiable credential contains claims, then show a data model that (for valid, yet pedantic reasons) has a credentialSubject property. We then have to explain that that's where claims should go, and then explain why the property is not just called "claims," since that is what a verifiable credential supposedly contains.

Rather than requiring this educational moment every time someone new looks at the data model, I support the proposal to change credentialSubject back to claims.

brentzundel avatar Apr 02 '19 01:04 brentzundel

I think there was also confusion about the embedded graph container, making the RDF subject of the claims confusing. We have examples in the focal use cases where there are multiple subjects, for example in a birth certificate and marriage certificate.

"CredentialSubject" strongly suggests that there is always single subject of the credential, but that's demonstrably untrue.

@dlongley could you provide an example of how an issuer would construct a VC with multiple subjects?

I scanned the current spec, but only found examples such as the following:

...
  "credentialSubject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "degree": {
      "type": "BachelorDegree",
      "name": "<span lang='fr-CA'>Baccalauréat en musiques numériques</span>"
    }
...

Am I correct that simply replacing that single object with an array works?

  "credentialSubject": [{
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "degree": {
      "type": "BachelorDegree",
      "name": "<span lang='fr-CA'>Baccalauréat en musiques numériques</span>"
    },{
    "id": "did:example:ebfeb1c276e12ec211f712ebc6f",
    "parent": {
      "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
      "type": "Mother"
    }]

If that's correct, I believe @David-Chadwick's issue isn't with CredentialSubject but with the id.

The example says, in effect, that the subject of the first claim is the mother of the subject of the second claim.

In this case, I think the best we can do is explain clearly that "id" is the "id" of the subject of just the each claim. That's something we need to clarify, but my understanding is that the way JSON-LD works the "id" field is necessary. Or is it possible to change the "id" field to "claimSubject"? That's the semantic meaning in this situation. Can we make it explicit?

In any case, when you see the multi-subject "CredentialSubject", it doesn't make sense.

jandrieu avatar Apr 02 '19 05:04 jandrieu

The way out of this dilemma is to have a claims object that contains within it one or more credentialSubject objects. Then it is clear that the VC contains claims, and that the ID is that of the subject. Eg.

"claims": [
 "credentialSubject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "degree": {
      "type": "BachelorDegree",
      "name": "<span lang='fr-CA'>Baccalauréat en musiques numériques</span>"
    },
"credentialSubject": {
    "id": "did:example:ebfeb1c276e12ec211f712ebc6f",
    "parent": {
      "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
      "type": "Mother"
    }
]
...

David-Chadwick avatar Apr 03 '19 08:04 David-Chadwick

That's not bad, except I believe the "credentialSubject" is actually a "claimSubject" in your example.

jandrieu avatar Apr 03 '19 14:04 jandrieu

I like it too. And with that context, wouldn't just "Subject" work as the name?

On Wed, Apr 3, 2019 at 7:17 AM Joe Andrieu [email protected] wrote:

That's not bad, except I believe the "credentialSubject" is actually a "claimSubject" in your example.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/vc-data-model/issues/480#issuecomment-479509031, or mute the thread https://github.com/notifications/unsubscribe-auth/ADLkTbh3PPL8HF2VN9ODFBEYubXA5nBwks5vdLgBgaJpZM4cNFRu .

talltree avatar Apr 05 '19 06:04 talltree

Maybe. I think folks get wrapped around the singular subject, thinking a credential can only have one subject. ClaimSubject focuses it nicely on this particular claim, gently inviting credentials that have claims about many subjects.

jandrieu avatar Apr 05 '19 06:04 jandrieu

I will note that the conversation in this issue is all over the place and is pretty classic bike shedding during the Candidate Recommendation phase (typically, a terrible, horrible time to bike shed core properties in the specification since implementers are already busy implementing using the properties that are currently being bikeshedded in this issue).

-1 to anything plural (claims, subjects, properties, etc.). schema.org made that mistake years ago and has been busy playing whack-a-mole to remove all the plural properties. The same goes for the Web Payments specification. We have over a decade of experience now naming properties that will go back-and-forth between JSON and RDF and the best practice is to NOT use plural form.

-1 to claim (and I say this as the person that initially put that in the specification). You are expressing "one or more credential subjects". The language in the spec may lead people to a different conclusion, and if it does, we should fix that specification text (not change the property to something that it's not).

I also note that the WG has many more things to worry about at present than bike shedding a name. Can we please just drop this issue and focus on things that are a better use of the WGs time? I'm concerned that this issue is going to consume a lot of time that should be spent doing things like getting the VC extension registry up and running, working on use case finalization, etc.

@RieksJ -- perhaps there is some non-normative text that you would like added to the specification to explain why the "credentialSubject" property is named what it is named?

msporny avatar Apr 25 '19 02:04 msporny

I appreciate that there is a lot to do. However, I don't consider heavy workloads as a valid argument for deciding to skip issues in a standardization process. I would support it as an argument to postpone the transition to a next phase, because standards should be solid. And in particular, discussions about what might seem to be details should be done carefully, because that's where the devils tend to be, and in my experience as expert in ISO SC27/JTC1, fixing standards is much harder once they're out there.

Note that this is not an argument to discuss every detail - relevance of the discussion must be shown first. The relevance of this issue is to prevent confusion and misinterpretation by readers (standards should be unambigous and clear). The term 'credentialSubject' suggests that there is a relation between the credential and a single subject (which isn't there). But even if we say that this standard uses singulars, then 'credentialSubject' still has the word 'Subject', and that doesn't cover its payload (which is a set of Claims). It is like having a sign on the door of a restroom that says 'Chair'.

To me, it is relevant that standards do not have these kinds of things in them. But if there is a consensus that preventing such confusion is irrelevant, then I have no problems with closing this issue, because that's how standardization works.

@msporny: I don't think it is up to me to elaborate on why 'credentialSubject' is named as it is, because I did not invent that name (and I wouldn't have named it that in the first place for the same reasons I created this issue).

RieksJ avatar Apr 25 '19 12:04 RieksJ

@RieksJ,

Using a relationship named credentialSubject makes sense in the data model, particularly if you think of the data as a graph, which is what we're modeling.

A credential is a node in this graph. If you want to represent something about this node, you create a link emanating out from the credential node that connects to another node in the graph. For example, if you want to identify the issuer of the credential, you can create a link named issuer and use it to connect the credential node to another node that represents the issuer. If you want to say that a credential is about a subject, you can link the credential node with a link called credentialSubject that connects to the subject node. If you want to say that the credential is about yet another subject, you create yet another link called credentialSubject and connect that to that other subject.

You can see from this why using plural names for links doesn't make much sense. The links identify the relationship between two nodes in a graph; if you want to say one node has the same kind of relationship with two other nodes, you add two links of the same name to the graph, each connecting to one of the other nodes.

Now, if you want to say things about any one of those subjects, you repeat this process -- you link the subject node you want to say something about to another node through another relationship. For example, you could create a link called alumniOf that connects the subject node to a node that represents a literal string value of Example University.

In the JSON and JSON-LD syntax, links are represented as JSON keys, where the JSON key id is a special key that is used as an identifier for a node. Nodes are represented as JSON objects or, if a node represents a literal value, it can be a string, number, boolean, etc. Other syntaxes may do something different while still being compliant with the data model.

At a high level, the entire graph is therefore comprised of sentences that include a "subject" (a node), a "property" (a link), and an "object" (another node/literal value). These sentences are the "claims" -- and in our model they are understood to have been made by the entity identified as the "issuer".

We have a picture of a graph showing how this works in the data model:

Credential Graph

dlongley avatar Apr 25 '19 17:04 dlongley

Section 4.4 Credential Subject says "This specification defines a credentialSubject property for the expression of claims about one or more subjects". Since JSON-LD does not allow duplicate keys, then what does a credential look like (in JSON-LD) that has claims (or better: JSON-LD graphs) about two different subjects? And might the figure need clarification so that it better shows how multiple claims on different subjects are done?

RieksJ avatar Apr 26 '19 07:04 RieksJ

@RieksJ,

Since JSON-LD does not allow duplicate keys, then what does a credential look like (in JSON-LD) that has claims (or better: JSON-LD graphs) about two different subjects?

Instead of using a single object for the value of credentialSubject an array of objects is used:

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://www.w3.org/2018/credentials/examples/v1"
  ],
  "id": "http://example.com/credentials/4643",
  "type": ["VerifiableCredential"],
  "issuer": "https://example.com/issuers/14",
  "issuanceDate": "2018-02-24T05:28:04Z",
  "credentialSubject": [{
    "id": "did:example:abcdef1234567",
    "name": "Jane Doe"
  }, {
    "id": "did:example:3d5c623bf63156cb1",
    "name": "John Doe"
  }],
  "proof": { ... }
}

And might the figure need clarification so that it better shows how multiple claims on different subjects are done?

+1. If you could propose some concrete text that helps clarify things for you we'll be able to pull it in much more quickly. And if you or someone else has the time to add another picture showing multiple credentialSubject links that would also be great.

dlongley avatar Apr 29 '19 15:04 dlongley

W.r.t. the figure: I've tried to do some quick stuff with svg, but I've trouble getting it done. I propose the following changes (decreasing importance):

  1. draw a box around the yellow boxes; this box is similar to that of 'credential graph' and 'proof graph' and could be named 'claims'.
  2. draw this box several times at a slightly different place, thus suggesting that multiple such claims can be referended
  3. add some more yellow boxes such that it becomes clear that there is a linked data graph here.

W.r.t. texts: the comments above by various people indicate that the texts that describe what a credential, a subject, a claim, etc. is, is far from clear, even for people that are consistent contributers to the text. While the issue I raised was just a part of this discussion, it seems to me that a decision is called for to determine whether to revise the text so as to address these issues, or leave it as it is. While it is my preference to do the first, I don't make these decisions.

RieksJ avatar May 01 '19 07:05 RieksJ

Decision on VCWG call 7 May 2019: RESOLUTION: The Working Group has discussed issue #480 and is not willing to make a substantive change to the specification that would trigger another Candidate Recommendation phase. The Working Group is interested in exploring non-normative resolutions to the issue. The WG would like to defer the issue so it can be considered when work continues beyond VC 1.0.

burnburn avatar May 07 '19 14:05 burnburn

We discussed this on the maintainence working group call and believe that this PR can be closed due to the impact on the number of implementations already done today. If the author believes this should still be addressed it can be handled in V2 and they can reopen it.

kdenhartog avatar Aug 11 '21 15:08 kdenhartog

The issue was discussed in a meeting on 2021-08-11

  • no resolutions were taken
View the transcript

4.6. Avoiding confusion by renaming 'credentialSubject' (issue vc-data-model#480)

See github issue #480.

Wayne Chang: this would add a huge breaking change, renaming a major component

Manu Sporny: +1 this would be a huge breaking change.

Dave Longley: +1 to close

Manu Sporny: +1 to closing

Wayne Chang: probably good to close, but we'd need to see broad support for changes like this in the new working group to re-open

iherman avatar Aug 12 '21 04:08 iherman

I am disappointed by the way this issue has been treated and has come to a close, as it might have been properly dealt with around the time it was raised. Instead, it has been lying around for over two years, thereby implicitly instructing all implementations to follow the contested practice. Allowing this issue to remain unaddressed caused it to become a 'huge breaking change', and continues to do so up to at least v2. It is like instructing people to litter a place and then refuse to participate in cleaning it up, suggesting that you can have another shot at it when you build a new place (v2). I consider this a bad practice for a standardization group.

RieksJ avatar Aug 12 '21 07:08 RieksJ

I am disappointed by the way this issue has been treated and has come to a close, as it might have been properly dealt with around the time it was raised.

@RieksJ, while I can understand your frustration, the reality is that the group did discuss the issue at depth: namely, in issue https://github.com/w3c/vc-data-model/issues/207

Then, when you raised this issue (during the Candidate Recommendation phase), the group did debate it again and came to the conclusion that it did not want to make the change you were suggesting: https://github.com/w3c/vc-data-model/issues/480#issuecomment-490116412

Since then, there has been hardly any discussion on the issue, which signals to the group that it has not been a concern for people building solutions using Verifiable Credentials.

While you are free to be disappointed with the outcome, the concepts that you raised in the issue did get very broad discussion in the group and the group did come to consensus on the path forward.

Fundamentally, the flaw with the argument in this issue is this: "the credentialSubject section contains a list of claims, rather than a (or more) subject(s)."

The credentialSubject section does, in fact, provide an associated list of credential subjects (that's why the name was picked), identified explicitly by credentialSubject.id or implicitly by an auto-generated identifier (blank node identifier). This is why the credentialSubject property was where the group ended up. We then hang claims off of each credential subject identifier. I personally would've preferred something else, but that's neither here nor there... credentialSubject is what achieved group consensus.

msporny avatar Aug 12 '21 13:08 msporny

RE: ...or implicitly by an auto-generated identifier (blank node identifier).

@msporny Where is this implied or, better, stated?

mwherman2000 avatar Aug 12 '21 14:08 mwherman2000

@rieksj @mspony A major part of the root cause of this issue appears earlier in the specification with a series of statements about Claims that are, generally and fundamentally, not true.

See https://github.com/w3c/vc-data-model/issues/790

mwherman2000 avatar Aug 12 '21 14:08 mwherman2000

RE: ...or implicitly by an auto-generated identifier (blank node identifier). @msporny Where is this implied or, better, stated?

https://www.w3.org/TR/json-ld11/#node-identifiers https://www.w3.org/TR/json-ld11/#identifying-blank-nodes https://www.w3.org/TR/json-ld11-api/#node-map-generation https://www.w3.org/TR/json-ld11-api/#generate-blank-node-identifier

msporny avatar Aug 12 '21 14:08 msporny

A major part of the root cause of this issue appears earlier in the specification with a series of statements about Claims that are, generally and fundamentally, not true.

There is a flaw in that assumption that I've documented here: https://github.com/w3c/vc-data-model/issues/790#issuecomment-897710929

msporny avatar Aug 12 '21 14:08 msporny

RE: ...or implicitly by an auto-generated identifier (blank node identifier). @msporny Where is this implied or, better, stated?

https://www.w3.org/TR/json-ld11/#node-identifiers https://www.w3.org/TR/json-ld11/#identifying-blank-nodes https://www.w3.org/TR/json-ld11-api/#node-map-generation https://www.w3.org/TR/json-ld11-api/#generate-blank-node-identifier

Thank you @msporny for the interesting links about the mechanics of JSON-LD.

Where in the VC Data Model specification is the connection made between credentialSubject id being optional and, say, for example, https://www.w3.org/TR/json-ld11/#identifying-blank-nodes? Where in the VC data model specification does it say how a non-existent elements like credentialSubject id is first detected and then subsequently identified as a blank node, etc. etc.

mwherman2000 avatar Aug 12 '21 15:08 mwherman2000

@msporny the issue wasn't dealt with in #207. Rather, the resolution of #207 caused the issue to be raised, and subsequent comments suggest there is/was support for it to be actually resolved. While the decision to defer the resolution of the issue appears to have been discussed, there is no mention of the motivation, which I can live with but reflects a way of doing things that in my opinion could be improved. The subsequent decision to close the issue is a 'maintenance decision' rather than a reflection on the actual content. While this might be appropriate for a repo of software code, but I consider it inappropriate for a repo of standardization issues.

RieksJ avatar Aug 12 '21 15:08 RieksJ

As there is opposition to closing this issue, I will re-open it. As a resolution to this issue lies beyond the scope of the current working group, I am going to label it defer-v2

brentzundel avatar Aug 12 '21 15:08 brentzundel

@mspony Where in the VC data model specification does it say how a non-existent elements like credentialSubject id is first detected and then subsequently identified as a blank node, etc. etc. [I don't believe past issues are an official part of the specification. The specification needs to stand on its own, doesn't it?

Also where in the VC data model specification tied to JSON-LD and where/how is an intelligent reader supposed to deduce this? Most people will not even get to section 6: https://www.w3.org/TR/vc-data-model/#json-ld

JSON-LD, if applicable, needs to be introduced at the beginning of the specification and incorporated into the explanations of what a Claim is and what a Credential is because JSON-LD introduces requirements are unnatural and will be unexpected for most intelligent readers. The linkages need to be made clear.

mwherman2000 avatar Aug 12 '21 15:08 mwherman2000