vc-data-model Default vocab for credentials context v2

JSON-LD defines a way to specify the default vocabulary for properties and types that otherwise do not match a term definition. Currently the credential context v1 does not make use of this feature, meaning producers of VC's must be explict in their term decelerations for all properties used, otherwise the jsonld signatures signing process will error due to the usage of a strict expansion map that requires all properties to be defined.

If instead a default vocabulary was included in a new revision of the credentials context, then the signing process would proceed and the properties without a formal term definition would be expanded with the default vocab. Take for instance the following example where the credentials context v2 would include the definition of "@vocab" set to "https://www.w3.org/2018/credentials/undefinedTerm"

{
  "@context": [
    "https://www.w3.org/2018/credentials/v2"
  ],
  "type": ["VerifiableCredential"],
  "issuer": "did:example:28394728934792387",
  "issuanceDate": "2019-12-03T12:19:52Z",
  "expirationDate": "2029-12-03T12:19:52Z",
  "credentialSubject": {
    "id": "did:example:b34ca6cd37bbf23",
    "someClaim": "A claim",
    "someOtherClaim": "Another claim"
  },
  "proof": {
     "type": "Ed25519Signature2018",
     "created": "2020-01-30T03:32:15Z",
     "jws": "eyJhbGciOiJFZERTQSIsI...wRG2fNmAx60Vi4Ag",
     "proofPurpose": "assertionMethod",
     "verificationMethod": "did:example:28394728934792387#keys-7f83he7s8"
  }
}

The properties "someClaim" and "someOtherClaim" during signing would be expanded to be https://www.w3.org/2018/credentials/undefinedTerm#someClaim and https://www.w3.org/2018/credentials/undefinedTerm#someOtherClaim which when browsed to would resolve to a page informing the user that the property is not formally defined and perhaps link to some helpful resources on how to define the term formally.

Nov 15 '20 22:11 tplooker

Interested in feedback and thoughts from @msporny @dlongley @OR13

Nov 15 '20 22:11 tplooker

I think defaulting is probably slightly better than dropping properties, but throwing an error remains the best thing to do here.

The question remains, why would you sign anything you don't understand?

If we added @vocab it would need to come with some strong warnings, and I would want pretty much all processors to throw instead of warn...

Right now, using undefined terms is more likely to cause an error.

I'm not sure if this is an improvement but if "undefinedTerm" had a human readable definition that said "don't trust anything from this issuer until they define this term"... I supposed that would be an improvement over what we have today.

Nov 16 '20 14:11 OR13

This is not a new problem or issue. LDAP introduced the extensibleObject object class for the same reason, so that undefined attributes could be added to an LDAP entry. Personally I dont like this. I think all VCs that cannot be fully understood should be rejected. This is because, different to LDAP entries, VCs are primarily security tokens. So having unknown information in a security token is dangerous. If you want to address this in a secure way, you should consider doing something like X.509 with its critical extension feature. With this feature, every attribute at the outer level is fully understood, even if an inner value is not. But the outer level says whether you can safely ignore inner values or not.

Nov 16 '20 15:11 David-Chadwick

@tplooker and I had a chat about this... I can see the advantage of "making the tent bigger" by welcoming folks who don't understand why signing unknown properties might not be a good idea, weakening the security model of VCs if we can add the following to the spec:

https://www.w3.org/2018/credentials/v2 documents MAY contain properties that are defaulted to "https://www.w3.org/2018/credentials/undefinedTerm#someClaim " via the use of @vocab.

A JSON-LD Processor SHOULD throw an error when the @vocab is used.

A JSON-LD Processor MUST warn if no error is thrown and @vocab is used.

https://www.w3.org/2018/credentials/undefinedTerm leads to a definition which warns that the document containing this term cannot be trusted, and is likely experimental or test data.

Reasons this would be better than what we have today.

Including schema.org essentially does this already, and silently defaults undefined terms to https://schema.org/someClaim
Many JSON-LD VC Processors already check for https://www.w3.org/2018/credentials/v1, when they check for https://www.w3.org/2018/credentials/v2 we can require them to handle errors better, and we SHOULD.
Some properties like type are already dropped silently by some (not all) libraries, see https://github.com/digitalbazaar/jsonld.js/issues/199
We are loosing an ability to protect users from undefined terms through a security in depth:

Throw Error > Default Member and Produce Warning > Default Member > Drop Property and Produce Warning > Drop Property.

We should start at the top, and only compromise when the risk to users is outweighed by the convenience to developers, and we are honest about which constituency we care more about in the spec.

Nov 16 '20 20:11 OR13

Another option we discussed, is for the default vocab to be defined as a security term that in present in both the credentials and security context and therefore making it a concept present at the JSON-LD signatures level not just a verifiable credentials concept (so instead of https://www.w3.org/2018/credentials/undefinedTerm a https://www.w3.org/security/undefinedTerm as the default vocab)

Nov 17 '20 01:11 tplooker

I don't know.. https://www.w3.org/2018/credentials/undefinedTerm#someClaim feels like a contradiction to me.

This sounds like someClaim is "undefined", and I understand this is exactly the intention, but now that term can actually be expanded just fine and there won't be any warning or error, since the default vocabulary makes it all 100% correct JSON-LD, no?

If there's an undefined term in your JSON-LD, the processor should simply throw an error...

Nov 17 '20 14:11 peacekeeper

@peacekeeper I understand that it appears as a contradiction.

To be clear I think in general JSON-LD processors SHOULD at least emit a warning when any property is expanded with @vocab so that this behaviour can be caught regardless.

Nov 17 '20 19:11 tplooker

If we are talking about a spec where we get to make normative requirements, I would repeat my assertions:

JSON-LD Processors SHOULD throw an ERROR
JSON-LD Processors MUST emit a warning if they don't throw an error

This is a spec related to cryptographic assertions / security... swallowing errors / warning is a terrible idea from a security perspective.... I can understand developers from untyped languages like javascript or python feeling like types are a "burden" and "slow me down / are confusing"... but frankly... they are wrong :)

types and strictness are part of good security engineering... we should not be failing open... we should be failing closed.

https://cwe.mitre.org/data/definitions/636.html

Nov 17 '20 20:11 OR13

This issue goes right to the core of the extensibility model for verifiable credentials. If we look at neighbouring technologies such as JWT they support what are known as public and private claims where public are recommended to be registered in the IANA registry, private are subject to collision and should be used with caution. At the moment we force users of verifiable credential technology when expressing in JSON-LD to define all their properties in some form in the JSON-LD context (note they are free to use @vocab for this too) and do not allow essentially "private claims". I understand the argument that this is a good natural force in the technology, driving producers to be precise, however I think it negatively harms the usability of the technology and also doesn't always work, e.g if I include schema.org in my context, im free to define any property and it wont error and really have no idea what is going on. If we instead allowed for the "private claims" equivalent via the inclusion of @vocab I think we strike a better balance where producers who want their credentials to use a fully defined vocab can, and those who dont won't and will suffer the consequences of poorly contexualized data.

Nov 17 '20 20:11 tplooker

I also dont think a signed property that has been expanded to an IRI of https://www.w3.org/2018/credentials/undefinedTerm#someClaim that when dereferenced clearly states to the user "Hey this property is not formally registered so we dont have a formal definition for you" doesn't mean we are defaulting open from a security perspective IMO

Nov 17 '20 20:11 tplooker

@vocab being used indicates a potential security issue that at least should yield a warning.
@vocab is useful for folks who like seeing warnings instead of errors in their security applications.

I personally don't like seeing warnings, and I don't like setting additionalProperties:true in JSON Schema or const data: any = {} in typescript.

I don't think we should hand developers a tool which lets them accidentally poison themselves and their users... the recommendation should remain to at least warn, and best practice would be to throw.

I do think making changes to the VC Data Model in this regard would be helpful, but I would formally object to swallowing undefined terms by adding @vocab without a mandatory warning, its like turning off a lint rule that protects against prototype pollution... the only reason to do so, is because the developer is not good at security, and can't figure out how to fix the issue correctly.

Nov 17 '20 20:11 OR13

case in point, if I add __proto__ and it gets preserved with a warning instead of a thrown error, and then some system passes that "valid VC" to another verifier systems that is vulnerable, I can actually get remote code execution on the verifier...

https://itnext.io/prototype-pollution-attack-on-nodejs-applications-94a8582373e7#:~:text=The%20Prototype%20Pollution%20attack%20(%20as,Remote%20Code%20Execution%20%E2%80%94%20RCE).

The fact that including schema.org allows for this exploit today, with no changes, does not mean we should normalize this behavior... it means we should patch the VC Data Model spec immediately to throw when @vocab is detected.

Nov 17 '20 20:11 OR13

I do think making changes to the VC Data Model in this regard would be helpful, but I would formally object to swallowing undefined terms by adding @vocab without a mandatory warning, its like turning off a lint rule that protects against prototype pollution... the only reason to do so, is because the developer is not good at security, and can't figure out how to fix the issue correctly.

+1 to a warning

Nov 17 '20 20:11 tplooker

perhaps we can split this up into:

@vocab detection MUST/SHOULD yield a warning.
@vocab is included in credentials/v2

I could be supportive of both, but 1 is decidedly less controversial given the disclosed issue above.

Nov 17 '20 20:11 OR13

apparently folks are already exploiting this behavior... https://github.com/blockchain-certificates/cert-verifier-js/blob/master/src/inspectors/computeLocalHash.js#L67

Nov 17 '20 21:11 OR13

ping @msporny @dlongley

Nov 17 '20 21:11 OR13

types and strictness are part of good security engineering... we should not be failing open... we should be failing closed.

This is not what happens in practise. If you look at today's browsers and interception products with X509 certs, they typically fail open rather than closed, because usability trumps security. We have written about this extensively, see for example

Wazan, Ahmad Samer; Laborde, Romain; Chadwick, David W; Kallel, Sana; Venant, Rémi; Barrere, François; Benzekri, Abdelmalek; Billoir, Eddie. "On the Validation of Web X.509 Certificates by TLS interception products", IEEE Transactions on Dependable and Secure Computing (2020). Print ISSN: 1545-5971, Online ISSN: 1545-5971, Digital Object Identifier: 10.1109/TDSC.2020.3000595

A.S. Wazan, R. Laborde, D.W. Chadwick, F. Barrere, A. Benzekri. "TLS Connection Validation by Web Browsers: Why do Web Browsers still not agree?". 41st IEEE Computers, Software, and Applications Conference (COMPSAC 2017), Politecnico di Torino, Turin, Italy July 4-8, 2017

Ahmad Samer Wazan, Romain Laborde, David W Chadwick, François Barrere, AbdelMalek Benzekri. “Which web browsers process SSL certificates in a standardized way?” 24th IFIP International Security Conference, Cyprus, May 18-20th, 2009

I personally don't like seeing warnings,

And users typically click through warnings, so they are ineffective, hence pointless.

Nov 18 '20 12:11 David-Chadwick

If instead a default vocabulary was included in a new revision of the credentials context, then the signing process would proceed and the properties without a formal term definition would be expanded with the default vocab.

Big -1 to this :) for at least the following reason:

This is a security vocabulary and reasonable defaults should lead to failing closed. Having a default @vocab goes against that design principle. By that, I mean that if something is not defined, we shouldn't assume what the developer intended because while we may be right 99.99% of the time... the 0.01% of the time that we're wrong will be used as a catastrophic exploit in some critical infrastructure somewhere in the world.

This is why it's a good thing that it's so difficult to get digital signature libraries to validate a digital signature... it's a like a very advanced lock on a door protecting your most treasured secrets... if all the tumblers don't line up just right, the door shouldn't open.

Remember that developers may not include this context in their proof, but at the top level of their object and by doing so, you may obliterate assumptions they may have about @vocab. They may want to use one and you overwrite it, or you may create a default vocab when they definitely don't want one. @vocab was always meant as a hack for lazy development ("I'll just slam a @vocab up here until I have the time to write a solid JSON-LD Context).

So, let's please drop this on the floor as the bad idea that it is -- no default @vocab -- it'll lead to insecure systems.

Nov 18 '20 14:11 msporny

apparently folks are already exploiting this behavior... https://github.com/blockchain-certificates/cert-verifier-js/blob/master/src/inspectors/computeLocalHash.js#L67

That's terrifying.

Nov 18 '20 14:11 msporny

@msporny might we consider adding a warning or throwing an error if @vocab is detected? As you can see, folks are already digging graves for their users with this feature.

Nov 18 '20 16:11 OR13

I think we've got a tooling problem -- not a spec problem here. We need better tools that make things easier for people to check their vocabularies, figure out what to do to fix them, or automate this process for them in a safe way. I don't think we need to (or should) change the core security underlying it all. VCs are improvements over existing systems for a number of reasons, one of which is the fail closed model -- where you have to define what you're talking about/signing vs. there being ambiguity around it -- whilst still allowing for decentralized extensions. The community just needs to make it easier for people to create and test those extensions.

Nov 18 '20 17:11 dlongley

@dlongley nobody is going to implement tooling support for non-normative requirements... IMO this is a spec issue, and presence of @vocab should result in an error, or at least a warning.... implementers should be normatively required to address this, because not doing so will lead to the kind of hacks noted in this thread:

schema.org being included and @vocab silently swallowing undefined terms (no error, no warning).
@vocab being injected too accomplish the same functionality via expandContext.push({ '@vocab': 'http://fallback.org/' });

Both of these are "valid interpretations of the vc api" imo, they are not good... would prefer to see them be forbidden.

In absence of spec changes, how might we address this?

Nov 18 '20 17:11 OR13

Ok so can we clarify the intended behaviour here, e.g in ANY case of using @vocab for expansion, whether it be the default one (or not as people are saying no) should a JSON-LD processor warn when doing JSON-LD expansion for the purposes of producing a linked data proof OR should it error?

I believe it should at least warn, as it is trivial to include in a documents context today a context that makes use of @vocab, case and point are the numerous parties that use https://schema.org directly and most likely don't even know@vocab is active. So in effect the particular security model you have outlined @msporny, is frequently, sometimes knowingly, sometimes unknowingly being subverted.

On a side note a similar question applies to usage of @base.

Secondly can someone please explain in more detail why failing on un-documented terms vs signing them under an IRI that clearly states, no clear definition is failing open?

Finally as @OR13 pointed out earlier, we (Mattr) originally looked at this as a solution because of 199 and without a solution to it, I would posit that a default vocab in a v2 credentials context is 100 times less of a security risk than silently dropping terms...

Nov 19 '20 02:11 tplooker

If you want to address this in a secure way, you should consider doing something like X.509 with its critical extension feature. With this feature, every attribute at the outer level is fully understood, even if an inner value is not. But the outer level says whether you can safely ignore inner values or not.

@David-Chadwick would you not still be able to detect which terms are fully understood with the expanded IRI that was used e.g if (term.startsWith("https://www.w3.org/2018/credentials/undefinedTerm#")) its unknown

Nov 19 '20 02:11 tplooker

Or better still if (term.type == https://www.w3.org/2018/credentials/undefinedTerm) its unknown

Nov 19 '20 02:11 tplooker

In X.509 you dont say whether the term is undefined or not, rather the recipient decides if it understands it or not. But each extension (term) is marked critical or not. Critical means you HAVE to understand it or reject the certificate. Not critical means you can safely ignore this term if you dont understand it. So mapping this to undefined schema terms, then each one would need to be marked critical or not. A recipient that understands the undefined term that is marked critical can process the VC, all other recipients would have to reject it. If the undefined term is marked non-critical than all recipients can safely ignore it if they dont understand it. Those that do will still process it. All defined schema terms are implicitly critical. This is the current implied meaning of the W3C recommendation.

Nov 19 '20 12:11 David-Chadwick

All defined schema terms are implicitly critical. This is the current implied meaning of the W3C recommendation.

So in this model you are saying that all properties in a VC MUST always be understood by the verifier, otherwise it should reject it, meaning there is no notion of optionally processed information in a VC?

Nov 25 '20 22:11 tplooker

In X.509 you dont say whether the term is undefined or not, rather the recipient decides if it understands it or not.

Agree, because on the surface just because a term has a formal definition it does not directly imply that every consumer understands that definition, unless you impose the constraint you spoke of above that all terms MUST have definitions and all verifiers MUST always understand ALL definitions for the terms featured in the data they are consuming.

Nov 25 '20 22:11 tplooker

This issue is a bit of a tire fire... can we split this up into a few separate issues?

IMO they should be:

Add warning about @vocab to spec.
Add mandatory error for processing @vocab to spec
Add ability to override error by defaulting @vocab to something that resolves to human readable documentation.

We don't need to agree to all 3, but we should be able to easily agree to 1.

Nov 30 '20 15:11 OR13

another developer bitten by @vocab https://gist.github.com/gorazdko/d40eda20846a8e20ac277f81e5661d34#vc-js-test-case

Do I need to weaponize this to get it fixed ? :)

Jan 17 '21 15:01 OR13

vc-data-model vc-data-model copied to clipboard

Default vocab for credentials context v2

vc-data-model
vc-data-model copied to clipboard