vc-data-integrity icon indicating copy to clipboard operation
vc-data-integrity copied to clipboard

Multiple significant security vulnerabilities in the design of data integrity

Open tplooker opened this issue 1 year ago • 148 comments

The following issue outlines two significant security vulnerabilities in data integrity.

For convenience in reviewing the below content here is a google slides version outlining the same information.

At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.

In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.

The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the @protected keyword doesn't cover terms that are defined using the @vocab and @base keywords. This means any terms defined using @vocab and @base are vulnerable to term redefinition.

The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the @context element of the JSON-LD document. The fact that the @context element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.

Please see the attached presentation for resolutions to this issue we have explored.

In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the @context element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.

I'm also happy to present this analysis to the WG if required.

(edited to put backticks around @words)

tplooker avatar Jun 21 '24 21:06 tplooker

I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs.

Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism.

We might want to add some examples to the specification that show that the information in documents can be expressed in one context and transformed into another. This could include showing an incoming document that is expressed using one or more contexts that the consumer does not understand, which can then be transformed using the JSON-LD API to a context that is trusted and understood. This would also help highlight the power of protection mechanisms that enable this kind of transformation.

For example, a VC that includes terms that are commonly consumed across many countries and some that are region specific. By using the JSON-LD API, a consumer that only understands the global-only terms can apply such a context to ensure that the terms they understood will appear as desired and other region-specific terms are expressed as full URLs, even when they do not understand or trust the regional context. All of this can happen without losing the ability to check the authenticity of the document.

We can also highlight that simpler consumers continue to be free to outright reject documents that are not already presented in the context that they trust and understand, no matter their authenticity.

dlongley avatar Jun 21 '24 22:06 dlongley

I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs.

The fundamental point of digital signatures is to reduce the information that needs to be trusted prior to verification. Most modern technologies e.g SD-JWT, mDocs, JWT and COSE and JOSE at large do this successfully meaning a relying party only needs to trust a public key prior to attempting to verify the signature of an otherwise untrusted payload. If the signature check fails, the payload can be safely discarded without undue expense.

The problem with data integrity is that this assumption is not the same. In essence the relying party doesn't just need the public key of the issuer/signer, but also all possible JSON-LD context entries that issuer may or may not use, if any of these are corrupted, manipulated or untrusted ones injected, the attacks highlighted in this issue become possible. Whether it is even possible to share these contexts appropriately at scale is another question, but these attacks demonstrate at a minimum that an entirely unique class of vulnerabilities exist because of this design choice.

Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism.

The point im making is not about whether one should understand the context of a message it has received or not, its about when it should attempt to establish this context. Doing this prior to validating the signature is dangerous and leads to these vulnerabilities.

For instance a JSON-LD document can be signed with a plain old JWS signature (like in JOSE COSE), once the signature is validated one can then process it as JSON-LD to understand the full context, if they so wish. The benefit of this approach is that if the JSON-LD context have been manipulated (e.g the context of the message), the relying party will have safely discarded the message before even reaching this point because the signature check will have failed. Data integrity on the other hand requires this context validation to happen as a part of signature verification thus leading to these issues.

tplooker avatar Jun 21 '24 22:06 tplooker

Another take on this is that Data Integrity signing methods that sign the canonicalized RDF derived from JSON-LD, rather than the JSON-LD itself, enable multiple different JSON-LD inputs to canonicalize to the same RDF. The JSON-LD itself isn't secured - only RDF values derived from it. If only the derived RDF values were used by code, it might not be a problem, but in practice, code uses the unsecured JSON-LD values - hence the vulnerabilities.

selfissued avatar Jun 21 '24 23:06 selfissued

In the example where the firstName and middleName plaintext properties are swapped, what should the verifier's behavior be? I don't think it should just be to verify the credential, whatever type it might be and then look at the plaintext properties within it that used @vocab-based IRIs. If I were writing this verifier, I would also ensure the @context matched my expectations, otherwise I wouldn't be sure that the properties of credentialSubject I was looking for actually meant the things that I expected them to mean.

If they were trying to depend on a credential of a certain type that expressed a holder's first name and middle name, it would not be a good idea to miss a check like this. Don't accept properties that aren't well-@protected in expected contexts. This is an additional cost that comes with processing JSON-LD documents like VCDM credentials, but it's not a step that should be skipped, because you're right that skipping it might open an implementer up to certain vulnerabilities.

Approaches that work:

  1. Verify the @context matches your expectations, such as including only known context URLs to contexts appropriate for the credential type that use @protected and only use explicitly defined terms.
  2. OR, use the JSON-LD tools to compact the credential into the context you expect before relying on plaintext property names.

Communities developing and using new credential type specifications benefit from defining a good @context with appropriately @protected terms. @vocab is ok for experimentation but not so great for production use cases. We don't really have a huge number of credential types yet, but hopefully as the list grows, the example contexts established for each of the good ones makes for an easy-to-follow pattern.

ottonomy avatar Jun 22 '24 01:06 ottonomy

schema.org and google knowledge graph both use @vocab.

https://developers.google.com/knowledge-graph

The problem is not JSON-LD keywords in contexts, the problem is insecure processing of attacker controlled data.

If you want to secure RDF, or JSON-LD, it is better to sign bytes and use media types.

You can sign and verify application/n-quads and application/ld+json, in ways that are faster and safer.

W3C is responsible for making the web safer, more accessible and more sustainable.

Data integrity proofs are less safe, harder to understand, and require more CPU cycles and memory to produce and consume.

They also create a culture problem for RDF and JSON-LD by attaching a valuable property which many people care deeply about (semantic precision and shared global vocabularies), with a security approach that is known to be problematic, and difficult to execute safely.

These flaws cannot be corrected, and they don't need to be, because better alternatives already exist.

W3C, please consider not publishing this document as a technical recommendation.

OR13 avatar Jun 22 '24 03:06 OR13

2024-05-08 MATTR Responsible Disclosure Analysis

On May 8th 2024, MATTR provided a responsible security disclosure to the Editor's of the W3C Data Integrity specifications. A private discussion ensued, with this analysis of the disclosure provided shortly after the disclosure and a public release date agreed to (after everyone was done with the conferences they were attending through May and June). The original response, without modification, is being included below (so language that speaks to "VC Data Model" could be interpreted as "VC Data Integrity" as the original intent was to file this issue against the VC Data Model specification).

The disclosure suggested two separate flaws in the Data Integrity specification:

  • "The unprotected term redefinition vulnerability"
  • "The @context substitution vulnerability"

The Editors of the W3C Data Integrity specification have performed an analysis of the responsible security disclosure and provide the following preliminary finding:

Both attacks are fundamentally the same attack, and the attack only appears successful because the attack model provided by MATTR presumes that verifiers will allow fields to be read from documents that use unrecognized @context values. Two documents with different @context values are different documents. All processors (whether utilizing JSON-LD processing or not) should treat the inbound documents as distinct; the software provided by MATTR failed to do that. Secure software, by design, does not treat unknown identifiers as equivalent.

That said, given that a credential technology company such as MATTR has gone so far as to report this as a vulnerability, further explanatory text could be added to the VC Data Model specification that normatively state that all processors should limit processing to known and trusted context identifiers and values, such that developers do not make the same mistake of treating documents with differing @context values as identical prior to verification.

The rest of this document contains a more detailed preliminary analysis of the responsible disclosure. We thank MATTR for the time and attention put into describing their concerns via a responsible security disclosure. The thorough explanation made analysis of the concerns a fairly straightforward process. If we have made a mistake in our analysis, we invite MATTR and others to identify the flaws in our analysis such that we may revise our findings.

Detailed Analysis

A JSON-LD consumer cannot presume to understand the meaning of fields in a JSON-LD document that uses a context that the consumer does not understand. The cases presented suggest the consumer is determining the meaning of fields based on their natural language names, but this is not how JSON-LD works, rather each field is mapped to an unambiguous URL using the JSON-LD context. This context MUST be understood by the consumer; it cannot be ignored.

A verifier of a Verifiable Credential MUST ensure that the context used matches an exact well-known @context value or MUST compact the document using the JSON-LD API to a well-known @context value before further processing the data.

Suggested Mitigation 1
Add a paragraph to the Data Integrity specification that mentions this and links to the same section in the Verifiable Credentials specification to help readers who are not familiar with JSON-LD, or did not read the JSON-LD specification, to understand that `@context` cannot be ignored when trying to understand the meaning of each JSON key. Additional analogies could be drawn to versioning to help developers unfamiliar with JSON-LD, e.g., "The `@context` field is similar to a `version` for a JSON document. You must understand the version field of a document before you read its other fields."

The former can be done by using JSON schema to require a specific JSON-LD shape and specific context values. This can be done prior to passing a document to a data integrity implementation. If contexts are provided by reference, a document loader can be used that resolves each one as "already dereferenced" by returning the content based on installed context values instead of retrieving them from the Web. Alternatively, well-known cryptographic hashes for each context can be used and compared against documents retrieved by the document loader over the Web. For this approach, all other JSON-LD documents MUST be rejected if they do not abide by these rules. See Type-Specific Credential Processing for more details on this:

https://www.w3.org/TR/vc-data-model-2.0/#type-specific-credential-processing.

This former approach is less powerful than using the JSON-LD Compaction API because it requires more domain-specific knowledge to profile down. However, it is still in support of decentralized extensibility through use of the JSON-LD @context field as a decentralized registry, instead of relying on a centralized registry. Decentralized approaches are expected to involve a spectrum of interoperability and feature use precisely because they do not require a one-size fits all approach.

Applying these rules to each case presented, for case 1:

A verifier that does not use the JSON-LD API and does not recognize the context URL, https://my-example-context.com/, will reject the document.

A verifier that does not use the JSON-LD API and does recognize the context URL, https://my-example-context.com/, will not conflate the natural language used for the JSON keys with their semantics. Instead, the verifier will appropriately use the semantics (that happens to be the opposite of the natural language used in the JSON keys) that the issuer intended, even though the JSON keys have changed.

A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context, and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended.

For case 2:

A verifier that does not use the JSON-LD API and does not recognize the attacker-provided context URL, https://my-malicious-modified-context.com/, will reject the document.

A verifier that does not use the JSON-LD API and does recognize the attacker-provided context URL, https://my-malicious-modified-context.com/, will not conflate the natural language used for the JSON keys with their semantics. Instead, the verifier will appropriately use the semantics (that happens to be the opposite of the natural language used in the JSON keys) that the issuer intended, even though the JSON keys have changed.

A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context (and optionally, https://my-original-context.com), and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended.

Note: While the disclosure suggests that the JSON-LD @protected feature is critical to this vulnerability, whether it is used, or whether a Data Integrity proof is used to secure the Verifiable Credential, is orthogonal to ensuring that the entire @context value is understood by the verifier. For clarity, this requirement stands even if an envelope-based securing mechanism focused on syntax protection were used to ensure authenticity of a document. Misunderstanding the semantics of an authentic message by ignoring its context is always a mistake and can lead to unexpected outcomes.

Comparison to JSON Schema

The scenarios described are identical in processing systems such as JSON Schema where document identifiers are used to express that two documents are different. A JSON document with differing $schema values would be treated as differing documents even if the contained data appeared identical.

Original document

{"$schema": "https://example.com/original-meaning",
 "firstName": "John"}

New or modified document

{"$schema": "https://example.com/new-meaning",
 "firstName": "John"}

Any document processor whether utilizing JSON Schema processing or not would rightly treat these two documents as distinct values and would seek to understand their values equivalence (or lack of it) prior to processing their contents. Even consuming a document that is recognized as authentic would be problematic if the $schema values were not understood.

Original meaning/schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/original-meaning",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The name by which a person is generally called: 'given name'",
      "type": "string"
    }
  }
}

New meaning/schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/new-meaning",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The name spoken first in Japan: typically a surname",
      "type": "string"
    }
  }
}

Demonstration of Proper Implementation

The attack demonstration code provided adds the unknown modified/malicious contexts to the application code's trusted document loader. A valid application should not do this and removing these lines will cause the attack demonstrations to no longer pass:

documentLoader.addStatic("https://my-example-context.com/", modifiedContext)

https://gist.github.com/tplooker/95ab5af54a141b69b55d0c2af0bc156a#file-protected-term-redefinition-attack-js-L38

To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/93c0ba17b25e500d72c1ad131fe7e869

documentLoader.addStatic("https://my-malicious-modified-context.com/", modifiedContext)

https://gist.github.com/tplooker/4864ffa2403ace5637b619620ce0c556#file-context-substitution-attack-js-L48

To see "Proof failed" when this line is commented out and the failure result is logged, see:

https://gist.github.com/dlongley/4fb032c422b77085ba550708b3615efe

Conclusion

While the mitigation for the misimplementation identified above is fairly straightforward, the more concerning thing, given that MATTR is knowledgeable in this area, is that they put together software that resulted in this sort of implementation failure. It demonstrates a gap between the text in the specification and the care that needs to be taken when building software to verify Verifiable Credentials. Additional text to the specification is needed, but may not result in preventing this sort of misimplementation in the future. As a result, the VCWG should probably add normative implementation text that will test for this form of mis-implementation via the test suite, such as injecting malicious contexts into certain VCs to ensure that verifiers detect and reject general malicious context usage.

Suggested Mitigation 2
Add tests to the Data Integrity test suites that are designed to cause verifiers to abort when an unknown context is detected to exercise type-specific credential processing.

msporny avatar Jun 22 '24 16:06 msporny

If you consider the contexts part of source code, then this sort of attack requires source code access or misconfiguration.

Validation of the attacker controlled content prior to running the data integrity suite, might provide mitigation, but at further implementation complexity cost.

Which increases the probability of misconfiguration.

A better solution is to verify the content before performing any JSON-LD (or other application specific) processing.

After verifying, schema checks or additional business validation can be performed as needed with assurance that the information the issuer intended to secure has been authenticated.

At a high level, this is what you want:

  1. minimal validation of hints
  2. key discovery & resolution
  3. verification
  4. validation
  5. deeper application processing

Most data integrity suites I have seen do this instead:

  1. deep application processing (JSON-LD / JSON Schema)
  2. canonicalization (RDF)
  3. verification
  4. validation
  5. deeper application processing

The proposed mitigations highlight, that these security issues are the result of a fundamental disagreement regarding authentication and integrity of data.

Adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc...

Any application processing that occurs prior to verification is a design flaw, doubling down on a design flaw is not an effective mitigation strategy.

OR13 avatar Jun 22 '24 20:06 OR13

@OR13

adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc...

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

which is loop and simple string comparison. I don't see a reason for any of the exploits you have listed here except an implementer's incompetence.

Please can you elaborate how those exploits could be performed and provide a calculation, an estimation, how much this adds to complexity?

Thank you!

filip26 avatar Jun 22 '24 20:06 filip26

@filip26, setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent".

Your specific pseduo-code is in-sufficient for at least the following reasons:

  1. What is actually input into the cryptographic verification procedure for data integrity aren't URL's, its the contents behind those URL's. So to put it plainly, data integrity cannot by design ensure that the contents of the @context entries used to actually verify a credential are those that the issuer used, because they are not integrity protected by the issuers signature.
  2. You have disregarded inline contexts, the @context array is not simply guaranteed to be an array of strings it may also include objects or "inline contexts".
  3. Your check appears to imply ACCEPTED_CONTEXTS is a flat list of contexts acceptable for any issuer, this means if contexts from different issuers collide in un-expected ways and a malicious party knows this, they can manipulate pre-trusted @context values by the relying party without even having to inject or modify an existing @context. If I'm mistaken and you meant that ACCEPTED_CONTEXTS is an array of issuer specific accepted contexts, then please explain how this is accomplished in an interoperable manner and or how it would scale.

(edited to put backticks around @words)

tplooker avatar Jun 22 '24 22:06 tplooker

@tplooker setting aside you are putting words in my mouth that I have not said which is quite rude and disrespectful ...

add 1. you are wrong, by ensuring data is processed with a context you accept (the URLs) you know what is behind those URLs, and how much you trust those URLs, and perhaps you have a static copy of the contexts. If you follow untrusted URLs then it's an implemters fault. Use a browser analogy. add 2. yeah, I've simplified that, an inline context is a bad practice add 3. your trust the URLs or not and based on the trust you proceed or not

filip26 avatar Jun 22 '24 22:06 filip26

I was browsing through past issues related to this. This specific issue was raised to suggest adding @vocab in the base vcdm 2.0 context. It's my understanding that the authors of the Data Integrity spec were opposed to this. This is now being pointed as a direct security concern.

@tplooker given these new findings, would you revise your support since this was a bad recommendation introducing a security concern according to your disclosure?

PatStLouis avatar Jun 22 '24 22:06 PatStLouis

The URL for a context doesn't actually matter... In fact some document loaders will follow redirects when resolving contexts over a network (technically another misconfiguration).

Depending on the claims you sign, you may only detect a mismatch in the signature, when you attempt to sign a document that actually uses the differing part of the context.

Contexts are just like any other part of source code... Every single line of source code is a potential problem.

You often don't control what 3rd parties will consider the bytes of a context to be... It's a feature, that's been turned into a defect by where it was placed.

"It verified for me, must be a problem in your document loader."

"I thought I would be able to fix it in only a few hours, but it took me 2 days and delayed our release"

"I finally figured out how data integrity proofs work, thanks for letting me spend all week on them"

I've paired with devs and shown them how to step through data integrity proofs, dumping intermediate hex values and comparing against a "known good implementation", only later to learn the implementation had a bug...

Misconfiguration is common in complex systems.

I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order.

Authentication of json -> json web signatures Specification of json structure -> json schemas Integrity protection of files -> hashes Semantic mapping for json -> JSON-LD

The essence of a recommendation, is that you believe there isn't a better alternative.

OR13 avatar Jun 22 '24 23:06 OR13

@OR13 I'm sorry but don't see it. You mention two issues: misconfiguration and bugs. Well, we have tests, certification, etc. Those issues are endemic to any software applications but we don't call all the software vulnerable because of just a possibility that there might be a bug but after we find a bug.

Misconfiguration is common in complex systems.

I would really like to see the complexity estimated. I guess we are seeing a very different picture.

I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order.

Please let's be factual, what experts, what was recommended, etc. In EU when press article starts with a title "American scientists have ... " everyone stops reading it (they add the American to make it credible ;)

filip26 avatar Jun 22 '24 23:06 filip26

@PatStLouis you raise an excellent point regarding default vocabularies.

It's never too late to change what's in a context (joke).

This working group cannot prevent anyone else from adding a context that includes a vocab.

You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context.

If json compatibility isn't a requirement, the working group can drop the vc-jose-cose spec and remove the vocab from the default context... This might even improve adoption of data integrity while clarifying that RDF is the claims format that W3C secures.

I've argued this point previously.

OR13 avatar Jun 22 '24 23:06 OR13

I was browsing through past issues related to this. https://github.com/w3c/vc-data-model/issues/953 was raised to suggest adding @vocab in the base vcdm 2.0 context. It's my understanding that the authors of the Data Integrity spec were opposed to this. This is now being pointed as a direct security concern.

@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. @vocab is a broadly useful feature, that has not changed through disclosure of this vulnerability, what has become apparent is that JSON-LD is broken with regard to how this feature works. Simply removing @vocab from the vocabulary doesn't fix this issue it would be a band aid, what needs to be fixed is 1) JSON-LD with regard to how @vocab works with @protected and 2) more generally the @context entry needs to be integrity protected to prevent manipulation.

(edited to put backticks around @words)

tplooker avatar Jun 23 '24 01:06 tplooker

Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting @vocab in the base vocabulary was a "bad recommendation". In actual reality it was also necessitated to fix an even worse issue with data integrity as documented here https://github.com/digitalbazaar/jsonld.js/issues/199 which lay around since 2017 un-patched, until we started a contribution for a fix in 2021, when we discovered it https://github.com/digitalbazaar/jsonld.js/pull/452. Personally I believe removing @vocab from the core context will likely re-introduce this issue for JSON-LD processors that aren't handling these relative IRI's correctly.

Furthermore, if others in the WG knew about this issue, specifically that @vocab didn't work with @protected and chose not to disclose it when discussing this proposal, then that is even more troublesome.

tplooker avatar Jun 23 '24 01:06 tplooker

@tplooker wrote:

setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent".

@tplooker wrote:

Furthermore, if others in the WG knew about this issue, specifically that @vocab didn't work with @protected and chose not to disclose it when discussing this proposal, then that is even more troublesome.

Please stop insinuating that people are acting in bad faith.

Now might be a good time to remind everyone in this thread thread that W3C operates under a Code of Ethics and Professional Conduct that outlines unacceptable behaviour. Everyone engaging in this thread is expected to heed that advice in order to have a productive discussion that can bring this issue to a close.

(edited to put backticks around @words)

msporny avatar Jun 23 '24 12:06 msporny

From an implementer perspective maybe adding an example that "should fail" could be a good thing. Something like at https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184084631 .

As an implementation "case experience", I implemented in .NET something that produces a proof like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6 the university crendetial and then also verifies it. It felt a bit tedious to find out what to canonicalize, hash and and sign to get a similar result. The code is or less private code still, but now that https://github.com/dotnetrdf/dotnetrdf/releases/tag/v3.2.0 and the canonicalization is publicly released, I might make something more public too. I still feel I need to go through this thread with more thought so I completely understand the issue at hand.

veikkoeeva avatar Jun 23 '24 13:06 veikkoeeva

@veikkoeeva wrote:

From an implementer perspective maybe adding an example that "should fail" could be a good thing.

Yes, that is already the plan for the test suite in order to make sure that no conformant implementations can get through without ensuring that they refuse to generate a proof for something that drops terms, and/or, depending on the outcome of this thread, use @vocab to expand a term.

That's a fairly easy thing that this WG could do to ensure that this sort of implementation mistake isn't made by implementers. Again, we'll need to see how this thread resolves to see what actions we can take with spec language and test suites to further clarify the protections that we expect implementations to perform by default.

msporny avatar Jun 23 '24 15:06 msporny

@OR13

It's never too late to change what's in a context (joke).

I'm not suggesting a change, my goal is to understand why this recommendation was suggested in the first place and removing it is now listed as a remediation step to a security concern raised from the very same parties who suggested it.

This working group cannot prevent anyone else from adding a context that includes a vocab.

Correct, @vocab is a great feature for some use cases. I enjoy the feature for learning about jsonld, development and prototyping until I publish a proper context. I wouldn't use it in a production system (or at least I haven't found a use case that requires it).

Many protocols have features that can be unsecured depending how you use them. This doesn't make the protocol inherently flawed.

You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context.

Apologies if you misunderstood my statement, but my intention was not to report an architectural flaw.

@tplooker

@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. @vocab is a broadly useful feature, that has not changed through disclosure of this vulnerability, what has become apparent is that JSON-LD is broken with regard to how this feature works. Simply removing @vocab from the vocabulary doesn't fix this issue it would be a band aid, what needs to be fixed is 1) JSON-LD with regard to how @vocab works with @protected and 2) more generally the @context entry needs to be integrity protected to prevent manipulation.

Yes @vocab is a useful feature, but should it always be present? Nothing is stopping someone from using it, it's a feature (but shouldn't be a default behaviour). I would argue the same that the decision of adding an @vocab in the base context of the vcdm 2.0 is a band aid solution in itself, derived from a need for easier development process.

The Data Integrity spec provides hashes for their entries that verifiers can leverage while caching the content. AFAIK this is already a thing.

Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting @vocab in the base vocabulary was a "bad recommendation". In actual reality it was also necessitated to fix an even worse issue with data integrity as documented here digitalbazaar/jsonld.js#199 which lay around since 2017 un-patched, until we started a contribution for a fix in 2021, when we discovered it digitalbazaar/jsonld.js#452. Personally I believe removing @vocab from the core context will likely re-introduce this issue for JSON-LD processors that aren't handling these relative IRI's correctly.

Thank you for pointing out to these issues, I enjoy looking back at historical data from before my time in the space. As pointed earlier, some of the parties that made that recommendation are now recommending removing it as a remediation to a security concerned that they raised.

The use cases listed for this recommendation was for development purposes as described in #953. Furthermore, the private claims section of the jwt RFC reads as follows:

Private Claim Names are subject to collision and should be used with caution.

Enabling this by default does not sound like a good recommendation to me.

It's easy to setup a context file, it takes 5 minutes and a github account. If you are doing development, you can just include an @vocab object in your base context for the short term, why the recommendation to make it part of the VCDM 2.0 context?

Regardless this was already discussed by the group and the decision has been made.

The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol.

For concluding I find this disclosure valuable as I got to learn a bit more about json-ld and gives a great resource to demonstrate implementers how to properly conduct verification of credentials + issuers how to properly design a VC.

(edited to put backticks around @words)

PatStLouis avatar Jun 23 '24 19:06 PatStLouis

The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol.

I would actually classify those attacks as "Data Integrity Signature Wrapping" (DISW) attacks. They share many similarities with XML Signature Wrapping Attacks (XSW) that occurred in the past. Also, note that it is possible to use XML Signatures securely if appropriate mitigations are implemented correctly. The same holds true for DI. The question is where we would add requirements for those additional mitigations for Data Integrity Proofs (DI).

The VCDM uses relatedResource to protect the integrity of specific external resources, such as @context values referenced by URIs. While DI is primarily used with the W3C VCDM 2.0, other JSON-LD data models might be secured by DI in the future, such as ZCaps, so we cannot just rely on mitigations defined in the W3C VCDM 2.0. For this reason, I believe this mechanism for protecting the integrity of the @context definitions is actually the responsibility of DI itself, since those @context definitions are part of the canonicalization and signature creation/verification. It would mitigate DISW attacks by making @context definition integrity checking part of the signature verification process. In that case, a similar mechanism to relatedResource has to be defined in the DI specification, and making it mandatory would help verifiers and issuers avoid skipping certain checks when issuing and verifying DIs.

awoie avatar Jun 24 '24 11:06 awoie

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the ACCEPTED_CONTEXTS array. Consider the following simplified VC examples:

Example: VC using a base data model for all driving licenses

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
  ],
  "type": [ "VerifiableCredential", "DrivingLicense" ]
  "credentialSubject": {
    "canDrive": true
   }
}

Example: VC issued by DMV of Foo

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://foo.com/ns/dl/v1"
  ],
  "type": [ "VerifiableCredential", "DrivingLicense", "FooLicense" ]
  "credentialSubject": {
    "canDrive": true,
    "foo": true
   }
}

Example: VC issued by DMV of Bar

{
  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://bar.com/ns/dl/v1"
  ],
  "type": [ "VerifiableCredential", "DrivingLicense", "BarLicense" ]
  "credentialSubject": {
    "canDrive": true,
    "bar": true
   }
}

When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar. Verifiers in the realm of Foo are primarily interested in the base properties of the DrivingLicense and occasionally in the specific properties of the FooLicense. The same situation applies to the realm of Bar, but with a focus on their respective properties.

Adopting the ACCEPTED_CONTEXTS approach would require Foo, Bar, and all other realms to continually distribute and update their individual context definitions. This approach just does not scale very well and it sacrifices the open-world data model since all @contex URLs and/or definitions have to be statically configured.

awoie avatar Jun 24 '24 11:06 awoie

we are speaking about this pseudo-code

  if (!ACCEPTED_CONTEXTS.includesAll(VC.@context)) {
     terminate
  }

It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the ACCEPTED_CONTEXTS array. Consider the following simplified VC examples: [...]

Great examples! Thanks!

Some context on why I think why a test "should not happen" plus a less mentioned issue of having good examples.

Related to https://github.com/w3c/vc-data-integrity/issues/272#issuecomment-2184999640: I'm not completely alien to this sort of work and indeed, when I implemented the "first pass sketch" of the code, I bit struggled with implications of this sort since I'm not so familiar with JSON-LD. So, I thought to "get back to with better time" and just not release anything before things are clearer (plus the library change not being public, though there's something already in the tests about this).

Some part of that was if I have a document like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6, how to pick apart the pieces for canonicalization, turning into bytes, hashing, signing and so on. For this "sketch" I was quite happy to have the same results with the keys as the example document, but I know I paid only passing thought for these sort of things. Partially because there's been related discussion earlier.

I mention this about this example piece, since I think since good examples are perhaps more important than has been implied here. I naturally also think that a test of what not should happen are important -- and maybe add some notes of the sort to an example or two too. They're already something I've (we) been codifying to some tests. It's also a great way to document things.

veikkoeeva avatar Jun 24 '24 12:06 veikkoeeva

@awoie a verifier should not guess what's inside a context nor to try to anticipate if there is some agreement between context providers.

When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar.

If a verfier recognizes both https://foo.com/ns/dl/v1 and https://bar.com/ns/dl/v1 then there is no issue. It simply means that a verifier accepts both DMV departments' vocabulary, no matter that there are shared parts. A situation in which a verifier accepts something just because it uses well known terms is a risk, not otherwise.

An ability to understand well know terms, e.g. defined by schema.org is a great feature but not in VCs eco-system where we don't want to guess but be sure.

This approach just does not scale very well and it sacrifices the open-world data model since all @context

It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context.

If there is a need, a good reason, to share parts between some parties, then the easiest, transparent, and scalable solution is this:

 "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://dmv-vocab/ns/dl/v1"
    "https://foo.com/ns/dl/v1"
  ],
 "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://www.w3id.org/dl/v1",
    "https://dmv-vocab/ns/dl/v1"
    "https://bar.com/ns/dl/v1"
  ],

filip26 avatar Jun 24 '24 13:06 filip26

@filip26 wrote:

If a verfier recognizes both https://foo.com/ns/dl/v1 and https://bar.com/ns/dl/v1 then there is no issue. It simply means that a verifier accepts both DMV departments' vocabulary, no matter that there are shared parts. A situation in which a verifier accepts something just because it uses well known terms is a risk, not otherwise.

I didn't say it is not a solution. My point was that it is a solution which does not scale. A verifier from Foo might have never seen a @context from Bar but it shouldn't matter because they agreed on a common vocab defined by https://www.w3id.org/dl/v1. Forcing all verifiers or issuers from different realms to continuously reach out to each other to keep @context URLs and definitions up-to-date and well-known does not scale for a lot of use cases.

@filip26 wrote:

It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context.

No, it doesn't because the assumption of the ACCEPTED_CONTEXTS is to statically configure them. The web is not static and not all actors monitor each other continuously.

awoie avatar Jun 24 '24 13:06 awoie

@awoie

No, it doesn't because the assumption of the ACCEPTED_CONTEXTS is to statically configure them. The web is not static and not all actors monitor each other continuously.

It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know - that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issuers, verifiers, etc.

filip26 avatar Jun 24 '24 13:06 filip26

@filip26 wrote:

It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know -

My point on scalability refers to an increase in operational costs, not necessarily performance. Performance might be another point but I cannot comment on that.

@filip26 wrote:

that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issues, verifiers, etc.

If this is what we want, this sacrifices the open-world data model the VCDM promise as mentioned here.

awoie avatar Jun 24 '24 14:06 awoie

@awoie I'm sorry, I don't think we are on the same page and I'll let others to explain that it does not affect scalability of VCs eco-system nor open-world data model.

filip26 avatar Jun 24 '24 14:06 filip26

@awoie I like you extensibility example a lot since its similar to the context in which I'm evaluating the impact of this.

My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system? The base information contained in the dl context is by design sufficient for verification needs by third parties.

Verifiers will know what information they want to verify, they are not blindly verifying abstract data.

As for the classification of this disclosure, while I can't really argue with your labeling, this is not a formal classification.

If we take 2 examples of vulnerability disclosed around XML Signature Wrapping Attacks:

  • CVE-2021-28091
  • CVE-2023-34923

Both of these affect a specific software and lead to 2 distinct CWE:

  • Improper Verification of Cryptographic Signature
  • Incorrect Authorization

They are not addressed by a change to XML, but a security mitigation in the affected software. This is an important distinction to make and loops back to a Security Misconfiguration.

It's hard for me to understand what exactly this disclosure tries to underline as the vulnerability

  • Json-ld?
  • Verifiable Credentials?
  • Data integrity?
  • A specific cryptosuite?
  • A specific implementation of a cryptosuite?

In seems the target of the vulnerability is being shifted around depending on the questions asked/comments made.

PatStLouis avatar Jun 24 '24 14:06 PatStLouis

@awoie I like you extensibility example a lot since its similar to the context in which I'm evaluating the impact of this.

My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system? The base information contained in the dl context is by design sufficient for verification needs by third parties.

@PatStLouis Yes, evaluating the base properties is sufficient for verification needs but the vulnerability explained in the disclosure allows attackers to:

In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated.

This might affect the base properties as well. A verifier from Foo cannot distinguish between modifications made by an attacker or @context values provided by legitimate actors from a different realm, e.g., Bar. The examples I provided did not show the attack, they just aimed to demonstrate that ACCEPTED_CONTEXTS does not scale because in certain cases participants of different realms cannot or do not want to interact with each other continuously.

awoie avatar Jun 24 '24 14:06 awoie