json-schema-vocabularies icon indicating copy to clipboard operation
json-schema-vocabularies copied to clipboard

Adding Semantic Annotations to JSON Schema

Open danielpeintner opened this issue 7 years ago • 38 comments


we (as part of the Web of Things working group) look into the issue how to allow typing values when exchanging data. For now we referenced JSON Schema.

For example an output type of a certain value is defined as follows

"outputData": {"valueType": { "type": "number" }}

In the example above "valueType" is essentially pointing to JSON Schema. One might wonder why { "type": "number" } is nested in "valueType". The reason is that we also have the requirement to semantically annotate a type...

"inputData": { "valueType": { "type": "integer" }, "actuator:unit": "actuator:ms" }

This requirement is the reason why we get in contact with you and also talked about it with @handrews at our last Face-to-Face who seems to be open to extend JSON schema with the possibility to add semantic annotation directly next to a type etc.

A naive and very simple proposal from our side could be to have the possibility to add next to each "type" in JSON schema (a) field(s) for semantic annotations (e.g., something similar exists in SAWSDL by the ‘modelReference’ attribute that allows to make a pointer to a semantic concept).

We believe that other specifications would benefit from semantic annotations also (see for example Swagger OpenAPI Specification).

What do you think? Does it sound reasonable to look into that issue?


danielpeintner avatar Apr 21 '17 06:04 danielpeintner

@danielpeintner thanks for filing this issue! IIRC your larger document is a JSON-LD document. Is this still true?

My biggest initial confusion is how you see JSON Schema sitting alongside the type features of JSON-LD. This is making it more difficult for me to see how to use semantics and schema together.

What limitations in JSON-LD's type indication features led you to JSON Schema?

Once I understand that, it will be easier to talk about the best way to combine these things. Some possible options include:

  • officially describing a way to embed JSON Schema documents in other JSON documents (JSON-LD and OpenAPI being particularly obvious examples, but I've seen this in several places indicating that it might be a broad need)
  • defining JSON Schema vocabularies (in your case, the validation vocabularies) as semantic vocabularies that you could then use as part of a JSON-LD document in the natural way for that media type.
  • defining a JSON Schema vocabulary for semantics- this seems lease likely as it is already addressed by JSON-LD, and the success of JSON-LD makes starting a competing system unappealing. But if we wanted to go with your suggesting of adding a semantic field alongside of the validation field "type", this is how we would go about it. We would want people to still be able to use our validation vocabulary without pulling in a larger semantic feature set. So semantics would be an add-on vocabulary like Hyper-Schema.

handrews avatar Apr 21 '17 17:04 handrews

Thanks for your reply!

What limitations in JSON-LD's type indication features led you to JSON Schema?

We use JSON-LD to describe the interactions (properties, actions and events) of a given "thing" along with some metadata.

For the case of interactions we also need to describe the structure of the data which expected as input or returned as response. For the case of simple types like strings JSON-LD may be just fine. For the case of composed types (e.g., a JSON object is expected with field "a" typed as int, "b" typed as float with maximum value of 100 and an optional element "c" as String) we we would like to use the expressive power of JSON Schema.

I hope this clarifies out thinking.

danielpeintner avatar May 02 '17 11:05 danielpeintner

@danielpeintner yes, that focus on objects is a great help! I had been thinking too much about the scalar types and have had trouble figuring out how to manage the overlap or why anyone would want to. But I've also done a bit more with RDF and OWL in the meantime, and can see how for more complex structures JSON [Hyper-]Schema would fit this use case better.

handrews avatar May 02 '17 14:05 handrews

Here is an example from robotics. To control the movements of robots, they are often equipped with an Inertial Measurement Unit (IMU), which combines accelerometer and gyroscope to control six degrees of freedom and re-construct the whole movement in 3D. In applications involving robot swarms or industrial robots on a product line that must coordinate, devices might expose IMU data through a single Web resource.

The associated JSON Schema could look like this:

  "type": "object",
  "properties": {
    "prop1": { "$def": "#/definitions/def1" },
    "prop2": { "$def": "#/definitions/def1" }
  "definitions": {
    "def1": {
      "type": "array",
      "minItem": "3",
      "maxItem": "3",
      "items": { "type": "number" }

Here, the data structure for acceleration and orientation is the same (a vector of 3 numbers). Annotation is required to distinguish between them and figure out e.g. whether prop1 is acceleration or orientation. For now, what we have in mind is a simple annotation like this:

  "properties": {
    "prop1": {
      "$def": "#/definitions/def1",
      "modelReference": "http://example.org/vocab#Acceleration"
    "prop2": {
      "$def": "#/definitions/def1",
      "modelReference": "http://example.org/vocab#Orientation"

vcharpenay avatar May 03 '17 08:05 vcharpenay

@vcharpenay Couldn't you keep JSON-LD declarations distinct from JSON Schema ones by relying on the JSON-LD context being referenced in an HTTP Link header (see https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld)?

GET http://example.org/imu/1 HTTP/1.1
Accept: application/json

HTTP/1.1 200 OK 
Content-Type: application/json
Link: <http://example.org/schemas/imu.json>; rel="describedby"
Link: <http://example.org/contexts/imu.jsonld>; rel="http://www.w3.org/ns/json-ld#context"
  "prop1": [1, 2, 0],
  "prop2": [1.1, 2, 9]

And http://example.com/contexts/imu.jsonld being:

      "prop1": "http://example.org/vocab#Acceleration",
      "prop2": "http://example.org/vocab#Orientation"

This way, one can interpret the JSON instance in both the context of validation (with JSON Schema) and semantics (with JSON-LD) without having to mix validation and semantics in the same description.

Alternatively, we may allow @context keyword in JSON Schema and have it ignored for validation.

dlax avatar May 03 '17 09:05 dlax

@dlax pretty much exactly what I was going to ask. JSON Schema validators SHOULD ignore unrecognized keywords, so the only reason to note this in the spec would be to reserve the keyword for this usage and encourage such integrations.

handrews avatar May 03 '17 09:05 handrews

The Web of Things includes other protocols than HTTP like CoAP or WebSocket. CoAP does not specify a "Link" Option (although one could define in a non-standard way) and WebSocket does not even have the notion of header, I think.

Yet, the JSON-LD context could also be given in the payload, However, in that case, if the data is encoded in binary formats like EXI or CBOR, adding a context URI would significantly increase the size of the message. In case of data streams, it would also introduce unnecessary redundancy.

This is why we believe metadata like JSON-LD mappings should rather be in the Thing Description.

vcharpenay avatar May 03 '17 13:05 vcharpenay

Alternatively, we may allow @context keyword in JSON Schema and have it ignored for validation.

How JSON-LD mappings should be defined is still an open question. What you suggest here might be sufficient indeed. However, as @handrews says, including it explicitly in the spec would encourage its use, especially if it comes with good tooling.

vcharpenay avatar May 03 '17 13:05 vcharpenay

The Web of Things includes other protocols than HTTP like CoAP or WebSocket. CoAP does not specify a "Link" Option (although one could define in a non-standard way) and WebSocket does not event have the notion of header, I think.

Just wanted to mention that, in case the protocol lacks the link notion, you could include an Hyper Schema link in your JSON Schema document:

  "type": "object",
  "properties": {
    "prop1": { "$ref": "#/definitions/def1" },
    "prop2": { "$ref": "#/definitions/def1" }
  "definitions": {
    "def1": {
      "type": "array",
      "minItem": "3",
      "maxItem": "3",
      "items": { "type": "number" }
  "links": [
      "rel": "http://www.w3.org/ns/json-ld#context",
      "href": "http://example.com/contexts/imu.jsonld"

Might not be want you want, but I think it readily works.

Alternatively, we may allow @context keyword in JSON Schema and have it ignored for validation.

How JSON-LD mappings should be defined is still an open question. What you suggest here might be sufficient indeed. However, as @handrews says, including it explicitly in the spec would encourage its use, especially if it comes with good tooling.

I also support this. Also, it may be needed to special case @-property to be ignored by validators since (I think) validation would fail on instances with @-properties if additionalProperties is true in the JSON Schema.

dlax avatar May 03 '17 18:05 dlax

Based on this discussion so far, I would recommend to allow both approaches:

  1. being self-contained where the semantic tagging is done directly in JSON Schema (see the sample of @vcharpenay above with the modelReference key).
  2. make references to a external JSON-LD document which makes the semantic declarations as @dlax pointed out

From the Web of Things perspective the first approaches would make sense when you have Things / Servients that are resource constrained in terms of memory and processing capabilities (e.g., simple temperature sensor). Processing steps such as downloading and doing semantics mapping can be omitted. The second approach makes sense if you have more complex scenarios such as high structured JSON content and/or more powerful Things. What do you think?

sebastiankb avatar May 13 '17 11:05 sebastiankb

Re-reading comments, it seems to me that we have been discussing around the idea of including semantic tagging in JSON Schema documents but I actually think this would better suited in JSON instances since semantic tagging is orthogonal to validation in general.

About JSON-LD, what could be done in JSON Schema specification is to have object member names starting with @ ignored from validation (without requiring "additionalProperties": false). That would mean that the same JSON instance could be readily interpreted in both the context of validation through JSON Schema and semantics through JSON-LD. It seems to me that such JSON instances would be close to the self-contained document @sebastiankb seems to be calling for.

In fact, it's not clear to me if JSON-LD is suitable for your Web of Things applications or not. If it is (as I assumed from @vcharpenay's initial comment), the question is rather how to expose it in an "optimized" way on application side and how to have it play well with JSON Schema validation.

dlax avatar May 15 '17 12:05 dlax

but I actually think this would better suited in JSON instances since semantic tagging is orthogonal to validation in general.

This not really helps us since the Thing Description is a kind of pre-information what a Thing can offers us such as what kind of data is provided and how is it encoded, what kind of functions/actions are served, and what kind of protocol(s) are supported. Especially for the data exchange case we need to understand how the structure looks like (->JSON Schema) and the meaning of the content (e.g., ‘which key in the object represent the temperature value’).

what could be done in JSON Schema specification is to have object member names starting with @ ignored from validation

This sounds good for me. I will discuss in the breakout session which kind of key term we should propose. Here are first ideas:

  • @type: used in JSON-LD already
  • @modelReference: same concept as in SAWSDL
  • @semantics: quite clear what the meaning is behind

What we should check is how JSON-LD parser handle the @ thing since there it has already reserved @ keys. Maybe it causes an parsing error if suddenly an unknown occurs. @vcharpenay Can you have a look on that?

sebastiankb avatar May 17 '17 23:05 sebastiankb

@vcharpenay wrote:

The Web of Things includes other protocols than HTTP like CoAP or WebSocket. CoAP does not specify a "Link" Option (although one could define in a non-standard way) and WebSocket does not even have the notion of header, I think.

Hyper-Schema allows defining links without needing to resort to headers. You do need the hyper-schema, of course, but it sidesteps the issue of protocol-dependent linking mechanisms entirely.

handrews avatar May 17 '17 23:05 handrews

@sebastiankb I think your view and @dlax's can be aligned by considering the Thing Description as both a JSON Schema document and an instance document (in general, JSON Schemas are also instances, such as when they are validated by their meta-schemas).

The Thing Description is an instance document that includes both JSON Schema snippets and semantic annotations. If I am understanding everything correctly, the semantics and schema work independently. You are not attaching semantics to the validation schema snippets, you are attaching semantics alongside of but independent of the structural validation. Is this correct?

handrews avatar May 17 '17 23:05 handrews

Mainly, we use the Thing Description to describe Thing’s interaction model which can be one or multiple properties (careful, in Web of Things we using the same term as in JSON Schema, however, it has different meaning), actions, and/or events. Each instance of an interaction defines an input and/or output. For this we would like to embed or to refer to a JSON-Schema definition to declare the payload data which is exchanged at runtime. At this point, we are not able anymore to add semantics since we would like to rely on the pure JSON Schema declarations. This is ok when we have to declare only a single data value type. Typically, the semantics that is provided by the interaction definition is sufficient to understand what this single value is intended to mean (plus the type). Its getting complicated, when the input/output is based on a object / complex type with multiple entries. There it would be great to add semantic annotation.

In yesterday's breakout session in Osaka I gave an introduction and use case about this topic. Maybe it helps to understand why we want to have this extension. Please find here the slides

sebastiankb avatar May 19 '17 07:05 sebastiankb

The Thing Description is an instance document that includes both JSON Schema snippets and semantic annotations. If I am understanding everything correctly, the semantics and schema work independently. You are not attaching semantics to the validation schema snippets, you are attaching semantics alongside of but independent of the structural validation. Is this correct?

@handrews, we do want to attach semantics to the validation schema snippets. The Thing Description should be a document that would allow for both validation and semantic processing of JSON data. The former requires a JSON schema, the latter a JSON-LD context. We could of course provide them separately. But as @dlax put it:

the question is rather how to expose [the JSON-LD context] in an "optimized" way on application side and how to have it play well with JSON Schema validation.

This discussion is indeed about optimizing, in the sense of reducing the amount of information WoT developers should provide. For instance: starting from the "extended" schema I gave about the IMU, @dlax could design an adequate JSON-LD context without further knoweldge about my application. This means machines could do that transformation as well, saving me the time I would have spent on modeling the JSON-LD context.

vcharpenay avatar May 20 '17 13:05 vcharpenay

@sebastiankb I think your view and @dlax's can be aligned by considering the Thing Description as both a JSON Schema document and an instance document

@handrews I've been thinking about this since you suggested it, initially thought it was nice, but am now a bit skeptical. Consider one wants to add JSON-LD @context in a JSON Schema, the only way I could come up with something meaningful is:

  "type": "object",
  "properties": {
    "prop1": {
       "@context": "http://example.org/vocab#Acceleration",
       "type": "array",
       "items": {
         "type": "number"
      "minItem": "3",
      "maxItem": "3"
    "prop2": {
       "@context": "http://example.org/vocab#Orientation",
       "type": "array",
       "items": {
         "type": "number"
      "minItem": "3",
      "maxItem": "3"

But I can't see how this could be useful because the @contexts are not meant to describe JSON Schema document's members but JSON instance's ones. Also, one has carry the structure of JSON Schema onto the JSON instance to map a context to its member; maybe it's not a big deal in practice but it's a bit awkward (still this coupling between validation and semantics)...

Did you have something different in mind?

(In fact, I now wonder if the proposal of allowing @-members in JSON instances is actually a good idea...)

dlax avatar May 20 '17 17:05 dlax

@vcharpenay Here's another proposal to convey semantics in JSON Schema by making use of Hyper-Schema links:

  "type": "object",
  "properties": {
    "prop1": {
      "type": "array",
      "items": {
        "type": "number"
      "minItem": "3",
      "maxItem": "3",
      "links": [
          "rel": "http://www.w3.org/ns/json-ld#context",
          "href": "http://example.org/vocab#Acceleration"

By having a link description object directly attached to a sub-schema (and not to the global schema), it seems pretty close to the modelReference initially suggested. The source resource of each link is a member of the JSON instance (i.e. prop1 in example), so this is readily machine-resolvable. I think this is also pretty self-contained/optimized from the developer point of view.

dlax avatar May 20 '17 17:05 dlax

The Thing Description should be a document that would allow for both validation and semantic processing of JSON data. The former requires a JSON schema, the latter a JSON-LD context. We could of course provide them separately.

@vcharpenay I think we're just using slightly different meanings of "attach" and "separately" here :-) Let me try to come at this a different way:

The only interaction I see here is that you are using JSON Schema (in addition to its usual validation functionality) to determine which parts of the instance should be processed by a given bit of JSON-LD. In your example, you are putting your annotations under specific properties within a JSON Schema "properties" object, indicating which annotations apply to which properties. You could theoretically do the same thing with array elements, or use constructs like "oneOf" to conditionally apply different annotations depending on the instance's run-time structure.

This is also how JSON Hyper-Schema makes use of the JSON Schema validation keywords: http://json-schema.org/latest/json-schema-hypermedia.html#rfc.section.3.1

Is that correct? Are there any other interactions between JSON-LD and JSON Schema that are desired?

handrews avatar May 20 '17 22:05 handrews

@dlax I've tried to sell the WoT folks on JSON Hyper-Schema, but so far they aren't biting :-)

handrews avatar May 20 '17 22:05 handrews

@dlax I've tried to sell the WoT folks on JSON Hyper-Schema, but so far they aren't biting :-)

Hm, okay, didn't know that... On the other hand, from https://w3c.github.io/wot-thing-description/#interaction-patterns, it seems that they're already using hypermedia links in the Thing Description. So it's not obvious why semantics information couldn't be conveyed in the same manner.

dlax avatar May 21 '17 08:05 dlax

We've come to a (temporary) conclusion within our group and will try it in our next PlugFest (2017/07). I summarized it here: w3c/wot-thing-description#5.

We will most likely re-open this issue at a later time and invite you to a joint meeting for a detailed discussion. Is anyone of you, by chance, at the workshop organized by the IRTF T2T group in Prague (WISHI)?

vcharpenay avatar Jun 21 '17 16:06 vcharpenay

+1 To adding JSON-LD vocabulary for semantics. JSON Hyper-Schema vocabulary is for hypermedia linking.

In our integration use cases we need to point to competencies defined in different taxonomies and we need to point to, or include, signed credentials. Semantics in an instance document would enable validating that we have a "correct one of those things."

-Chris Pauley Member of HR Open Standards and part of the Credentialing Ecosystem Mapping Project http://connectingcredentials.org/

chrispauley avatar Jul 10 '17 18:07 chrispauley

I have a use case, (simply wanting to pass extra information about how data should be presented on a web page e.g. SELECT vs DATALIST, INPUT TYPE=TEXT vs TEXTAREA etc) and I am trying to see how I can use JSON-Schema to do this in a standards compliant way.

Do I use the JSON-LD context vocabulary to define additional terms, or is it better to create a superset of the JSON-Schema to meet my needs. Or has someone somewhere already buttoned down this particular use case in an accepted way that I have so far failed to find.

surruk51 avatar Jan 01 '19 05:01 surruk51

@surruk51 purely within JSON Schema (not JSON-LD) this is what the $vocabulary feature in the forthcoming draft is for. It allows declaring what sets of keywords are being used, and with what semantics. Without that set of keywords necessarily needing to be in an RFC (although the ones in RFCs will be referencable as vocabularies as well).

One particular use case is to be able to add a set of keywords to control web UI display. Another is to control code generation (e.g. is this $ref composition or inheritance?).

However, it will be a while before this is implemented at all, and then people will need to define and implement new vocabularies.

For this draft, vocabularies are just URIs associated with a specification (e.g. the vocabulary that will probably have a URI like https://json-schema.org/draft/2019-02/vocab/validation will be defined as "section 6 of the JSON Schema Validation IETF draft specification" (see json-schema-org/json-schema-spec#697 for details of specific vocabs).

The following draft will hopefully define a machine-readable form of this, but that's not going to fit in for now.

In theory, you could make a JSON Schema vocabulary that references JSON-LD and use them together in that way (with regrettable overlap of the term "vocabulary"). But I haven't really sorted out how that would work in practice.

handrews avatar Jan 02 '19 06:01 handrews

@surruk51 purely within JSON Schema (not JSON-LD) this is what the $vocabulary feature in the forthcoming draft is for. It allows declaring what sets of keywords are being used, and with what semantics. Without that set of keywords necessarily needing to be in an RFC (although the ones in RFCs will be referencable as vocabularies as well).

One particular use case is to be able to add a set of keywords to control web UI display. Another is to control code generation (e.g. is this $ref composition or inheritance?).

However, it will be a while before this is implemented at all, and then people will need to define and implement new vocabularies.

For this draft, vocabularies are just URIs associated with a specification (e.g. the vocabulary that will probably have a URI like https://json-schema.org/draft/2019-02/vocab/validation will be defined as "section 6 of the JSON Schema Validation IETF draft specification" (see json-schema-org/json-schema-spec#697 for details of specific vocabs).

The following draft will hopefully define a machine-readable form of this, but that's not going to fit in for now.

In theory, you could make a JSON Schema vocabulary that references JSON-LD and use them together in that way (with regrettable overlap of the term "vocabulary"). But I haven't really sorted out how that would work in practice.

Would this mechanism be suitable for annotating schemas with type information for fake data generators? The ability to create a set of test data directly from a schema definition would be very very cool.

I was thinking of faking this (no pun intended), by adding a property syntax to the description keyword, to allow designation of the type information needed by faker (fake data generation library).

thawkins avatar Nov 05 '19 00:11 thawkins

@thawkins yes, it should be possible. Data generation and code generation are quite similar. In both cases, the validation constraint system can result in descriptions that are fine for validation but too ambiguous for proper generation (whether that's generation of code, UI, data, etc.) Additional vocabulary keywords can disambiguate those scenarios.

Note that the draft is no longer forthcoming but has in fact been published!

handrews avatar Nov 05 '19 00:11 handrews

@thawkins yes, it should be possible. Data generation and code generation are quite similar. In both cases, the validation constraint system can result in descriptions that are fine for validation but too ambiguous for proper generation (whether that's generation of code, UI, data, etc.) Additional vocabulary keywords can disambiguate those scenarios.

Note that the draft is no longer forthcoming but has in fact been published!

This is one i was looking at, but it has added keywords ("faker") which pollutes the schema definition, hence my search for annotations.


Another possible approach would be to use an escape keyword like "data-*" that would be ignored unless the processor understood the term, this is the same approach that html5 uses to add dynamic attributes to elements without breaking the html5 parser. ie. i could use "data-faker"

thawkins avatar Nov 05 '19 02:11 thawkins

@thawkins as I understand it, the faker keyword would be added via a meta-schema containing a vocabulary definition: https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.D.2.p.1

Something like:

  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$id": "https://example.com/meta/faker-vocab",
  "$recursiveAnchor": true,
  "$vocabulary": {
    "https://example.com/vocab/faker-vocab": true
  "type": ["object", "boolean"],
  "properties": {
    "minDate": {
      "type": "string"

And then JSON Schema's using the faker keyword would need to include a $vocabulary array containing the identifier of that meta-schema:

  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$id": "https://example.com/json-schema-faker-example",
  "$vocabulary": {
    "https://example.com/vocab/faker-vocab": true
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "faker": "name.findName"

Others more in-the-know might have better info, though. 😸

BigBlueHat avatar Nov 06 '19 15:11 BigBlueHat

@BigBlueHat that's pretty much correct, with one exception: $vocabulary always goes in the meta-schema. You can, of course, put it in a regular schema, but it's ignored there.

This may seem odd, but think of it this way: Everything in a meta-schema tells you something about the schemas described by the meta-schema. Meta-schemas don't tell you anything about themselves (unless they are their own meta-schema, at which point it gets confusing so let's skip it for now).

$vocabulary says "the thing that this schema describes is using the semantics defined by this JSON Schema vocabulary." Analogous to type saying "the thing that this schema describes conforms to this type, assuming it passes validation."

Since "using a JSON Schema vocabulary" has no meaning for anything other than JSON Schema, putting it in a non-meta-schema doesn't do anything useful. (I suppose someone could define another media type that references JSON Schema vocabularies but let's not borrow trouble).

...anyway... this is why you see $vocabulary in single-vocabulary meta-schemas like https://json-schema.org/draft/2019-09/meta/applicator and also se it re-stated in the general-purpose multi-vocabulary meta-schema https://json-schema.org/draft/2019-09/schema

So @thawkins would want to make a version of the multi-vocabulary meta-schema that adds the faker meta-schema to its allOf, and adds the faker vocabulary to its $vocabulary. You'll probably want the $vocabulary value in the multi-vocabulary meta-schema to be false, indicating that implementations that don't understand it can ignore it.

Now, all of this vocabulary stuff is meant to make it possible to to re-use vocabularies by writing some sort of plugin. If your implementation is the only one that's going to encounter this vocabulary, it's still possible to just use extra keywords without doing any of this. The correct behavior for any implementation is still to ignore unknown keywords. (the true value for $vocabluary gives schema authors a way to tell an implementation to fail if it doesn't recognize the vocabulary, which wasn't ever possible before).

handrews avatar Nov 08 '19 05:11 handrews