did-resolution icon indicating copy to clipboard operation
did-resolution copied to clipboard

Add security consideration about JSON-LD Context Integrity

Open ottomorac opened this issue 7 months ago • 6 comments

Add security consideration note about JSON-LD Context Integrity in order to address https://github.com/w3c/did-resolution/issues/53 .

I placed it under a new security considerations item called "JSON-LD Context Integrity".


Preview | Diff

ottomorac avatar May 08 '25 20:05 ottomorac

This was discussed during the #did meeting on 15 May 2025.

View the transcript

w3c/did-resolution#147

wip: raised by Otto regarding remote contexts....

wip: also about caching....

<Wip> This is the issue - w3c/did-resolution#147

markus_sabadello: Yeah I think the caching was mainly about the did document caching....

markus_sabadello: maybe we call it something else....

wip: where can we fit it then?

markus_sabadello: in security considerations but make its own entry...

ottomorac: agree with title of JSON-LD Context Integrity?

manu: yes I think it sounds good...

<manu> This is what we say about context integrity in the VC spec: https://w3c.github.io/vc-data-model/#base-context

manu: want to make sure it aligns with VC spec wording as well...

manu: you may want to take a look at this....

manu: wondering if we should change the name of the caching property.... Dave Longley may have opinions on this....

manu: we should also probably mention that is an attack vector to turn off the cache....

manu: actually I will open a new issue about the caching...


w3cbot avatar May 15 '25 15:05 w3cbot

Changes look good @ottomorac. Wondering if want to bring any additional spec text across from the VCDM spec - https://w3c.github.io/vc-data-model/#base-context.

Specifically, identifying the hash value of the DID context seems useful. This could be done in a separate PR when we have finalised the context.

cc @msporny

wip-abramson avatar May 16 '25 13:05 wip-abramson

Since this topic comes up in a number of places I figured I'd write a bit more here:

The general security requirement here is that any application, such as a DID resolver, must use a trusted copy of the context(s) that its logic is written against. One can think of these contexts as being known at "compile time" -- as opposed to contexts that the application may encounter at "runtime".

As an analogy, this is not too different from ensuring your application does not load some code dependency (a software library/module) over the internet from an untrusted location or without some appropriate integrity check. Calling into a library without these protections is unsafe, just like consuming a document with properties that are only supposedly defined a way you expect is unsafe.

Of course, this is why document-based validation, e.g., JSON schema checks in the JSON ecosystem, are typically performed on incoming documents. This allows your application to be sure that a document is "shaped" in the way the its subsequent logic expects (e.g., the document has the property "foo" with a string value in it).

JSON-LD documents carry additional semantic information, allowing an application to also ensure that, for example, the property "foo" has an intended global meaning, not just a "shape" that happens to be compatible with your application. This helps to further reduce confusion and provide more confidence that the subsequent logic in your application will execute properly.

While it may seem obvious, it is important to note that in both cases the logic of your application was written prior to receiving any document. The logic of your application natively understands (expects) a particular document shape or "type" and particular semantics for its properties. This means this was baked in / written into the application when it was created -- so both the expected JSON schema and the JSON-LD context(s) were necessarily known at that time and the logic of your application is bound to them. One can think of this as an interface your application has. To use your application the inputs must match these constraints.

A reasonable expectation is then that your application only load its JSON schema and its natively understood JSON-LD contexts in a trusted way. The simplest way to do that is for your application to have local copies of these that are used when resolving them; it is common to publish applications with embedded JSON schemas and the same can be done with the JSON-LD contexts that the application is also bound to. This handles the "compile time" requirements.

So, as for how your application handles other contexts that it only later encounters at runtime? You have a choice: it can either reject them outright, requiring others to make the inputs they feed you conform to your application's interface -- or you can accept arbitrary documents (that have arbitrary remote contexts and potentially even arbitrary shapes) and then use the JSON-LD API to transform the documents into the contexts and shapes that your application accepts. As always, some transformations might fail to produce anything acceptable anyway.

For interoperability purposes, the former is much simpler: Just tell data producers that they must use a particular context and document shape if they want their inputs to be commonly accepted by consumers. If someone wants to write an application that can consume many different types of documents (perhaps one that combines or does something else interesting with documents from many different ecosystems) they are of course welcome to do so and an interesting innovation may arise. But everyone can tell what global semantics were intended by authors who produce documents that are self-describing, e.g., documents that carry "context" information, even if those semantics aren't understood (and perhaps are rejected) by a particular consumer.

Finally, but still importantly, even if you loaded an arbitrary JSON schema (or arbitrary JSON-LD context) from the internet and its contents matched some content-integrity hash, your application logic wasn't previously written with the knowledge of this content. So it is not safe to directly consume a document that matched this schema based on that fact alone. It still needs to be transformed into a shape and context that your application's logic was written against. This means that a content-integrity hash on an unknown JSON schema or JSON-LD context does little for you. The hash only matters here when it is already familiar to you. Note that this is not specific to JSON schema, JSON-LD contexts, or any particular format but rather a general principle. Most people would not be happy to shout any random message they received into the public square just because its contents matched a hash.

Every message has a corresponding hash.

dlongley avatar May 16 '25 15:05 dlongley

So I think we concluded to move this into a separate subsection under "Security Consideration" (not under "Caching"), and maybe call it "JSON-LD Context Integrity".

We could maybe also include some comments from @dlongley in this PR directly, such as:

The general security requirement here is that any application, such as a DID resolver, must use a trusted copy of the context(s)

Maybe also mention that a DID resolver should have a local "compile time" copy of the context file.

peacekeeper avatar May 22 '25 13:05 peacekeeper

Thanks @dlongley and @peacekeeper for your comments.

I added the following verbiage to the end of my security consideration:

Instead a DID resolver SHOULD use a trusted copy of the context file (for instance a local copy of the JSON-LD file obtained prior to resolution).

I opted to use "SHOULD" because if we used "MUST" then that would require precisely defining what a "trusted copy" is. I personally struggle with that a bit, as it seems there are several ways of solving that (embedded contexts, obtaining a copy of the JSON-LD ahead of time, etc.). Let me know if you disagree with this approach or have other wording suggestions.

ottomorac avatar May 28 '25 18:05 ottomorac

This was discussed during the #did meeting on 29 May 2025.

View the transcript

w3c/did-resolution#147

Wip: Add to the security considerations section.

ottomorac: After some reviews of the comments, I added details around the JSON-LD context integrity. It's now its own security consideration. What I said is that I added verbiage that instead the DID resolver should use a trusted copy of the context file.
… I specifically opted to use SHOULD rather than MUST otherwise we get into details about what a trusted copy is.
… I struggled a bit with how to obtain the JSON-LD in the first place.
… Reviews are appreciated.

manu: I don't have an issue with the SHOULD. The confusing thing here is the second sentence, saying that a production resolver MUST NOT retrieve context files from remote locations. But then it MUST use a trusted copy as there's no other choice.

Ivan: It's a more general question. DID Resolution does not use JSON-LD. This is something we decided a few weeks ago.
… I guess you could say how does DID Resolution deal with these problems?

KevinDean: my concern with the prohibition of retrieving context files is that there may be additional context files that will not have direct relationship with the did document generation but if we limit the ability to retrieve remote files then we limit the ability of the resolver...

<Zakim> manu, you wanted to note we should be probably talking about apps that use results from a resolver.

dmitriz: On that topic, what it sounds like it's heading towards, and may resolve Kevin's concern, is we would have a registry of context files and corresponding hashes. e.g., Here's the DID context location and hash so that resolvers, if they are loading remotely, can ensure integrity.

<manu> https://w3c.github.io/vc-data-model/#contexts-vocabularies-types-and-credential-schemas

manu: I agree with both what Kevin and Dmitri are saying. I thought what we were doing with this issue is reuse the language in the VC Data Model.
… We had a very long discussion about this in the VC working group and came up with some text that made everyone equally grumbly but seemed to work.
… "You can load remote things but make sure the hash matches something you expected it to match."

<ottomorac> "It is strongly advised that all JSON-LD Context URLs used by an application use the same mechanism, or a functionally equivalent mechanism, to ensure end-to-end security. Implementations are expected to throw errors if a cryptographic hash value for a resource does not match the expected hash value."

manu: We don't say MUST or SHOULD, except for MUST in the base context value.
… If we do that, we get around the MUST vs. SHOULD thing. I agree that we shouldn't prevent contexts being loaded remotely.
… +1 to what Kevin says. A resolver should have the ability to do validation that a document is well-formed. The guidance should not only be for resolvers but for downstream applications as well.
… I think what we might want to do is try to use almost the exact same language as we used in the VC Data Model spec. We didn't put MUST or SHOULD in there.
… Either because it was too late to change or we didn't want to offend those that didn't want to do JSON-LD validation at all.
… The DID Resolution specification doesn't have a base context. It's really about the DID documents that are returned.
… If there's any object that you process as JSON-LD, that's where the warning applies.
… Those are the things that need to make sure that they use a context with a cryptographic hash.

<Wip> w3c/did-resolution#53

manu: You can fetch from the Internet but make sure that the cryptographic hash matches what you expect it to be.

Ivan: Getting back to what Manu said and my original question. What this means is that the DID resolver in general, when it gets a DID document, is required consistency checks on the DID document. Is that clearly described in the spec?

<dmitriz> I agree with Ivan (and manu) that it's not the Resolver's ob

Ivan: It's not just a conduit between the one that asked for the resolution and the one that provided the document.

<dmitriz> job, rather. that it's the consumers of resolvers

manu: I don't know if that's true. I don't know if a resolver has to do any checking. Maybe we have a statement that the resolver can't return an invalid DID document.

<dmitriz> and KevinDean's earlier comment about basic structural validation -- that's not a context's job anyways. that's more for JSON SChemas

markus_sabadello: I don't think we have it in the DID Resolution specification. We have an InvalidDocument error but nothing in the spec requires it.
… I don't want the resolver to transform or manipulate the document. I think that's one of the reasons why validation is not required right now.

Ivan: If it is correct, and I believe Markus, then any context statement is an opaque key-value pair in JSON. So why do we even speak about security issues related to JSON-LD if resolution doesn't do anything with JSON-LD.

<Zakim> manu, you wanted to note that we shouldn't require JSON-LD processing.

Ivan: I have the impression that there is nothing to add here.

manu: I think it's that the resolver might do JSON-LD processing, which is why we have that text. Downstream applications should pay attention as well.

<dmitriz> (I'm still struggling to picture _why_ a Resolver would do any json-ld processing)

manu: We should absolutely not require JSON-LD processing. It's something they can do, but it's not required.


w3cbot avatar May 29 '25 16:05 w3cbot

Hi All, Thanks for your suggestions. After the wg discussion feedback from today; I have added the verbiage used in the VC Data Model around hashing of JSON-LD contexts with some adjustments so that it is written from the perspective of a DID resolver. Hope this works better.

ottomorac avatar May 30 '25 02:05 ottomorac

Merging after multiple approvals and suggestions applied.

peacekeeper avatar Jun 19 '25 08:06 peacekeeper