json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

Restriction of processing $vocabulary to meta-schemas is unnecessary and confusing

Open handrews opened this issue 3 years ago • 7 comments

I've noticed that people often think of $vocabulary as a very strange case of keyword, specifically asking why it is in the meta-schema and not the schema. The answer is that like all keywords in a schema, it describes the instance. $vocabulary is only meaningful when the instance is a schema (and therefore $vocabulary is being processed in a meta-schema).

But it's not harmful, except for performance, to "process" it in a normal schema. It has the same semantics, meaning that it indicates the JSON Schema keywords that could be used in the instance... which only makes sense if the instance is a schema.

But $vocabulary is essentially an annotation. It is applied to the instance, and the application (the same schema validator that was already running) uses it to load vocabulary support if needed. Annotating a non-schema instance with $vocabulary does nothing.

I think we can replace the last paragraph of §8.1.2:

The "$vocabulary" keyword MUST be ignored in schema documents that are not being processed as a meta-schema. This allows validating a meta-schema M against its own meta-schema M' without requiring the validator to understand the vocabularies declared by M.

with something about $vocabulary behaving as an annotation, which would then allow us to completely remove §9.1.3 Detecting a Meta-Schema. Or just relax it from a MUST to a SHOULD or even a MAY, as there may well be some optimizations possible.

But I never really liked making meta-schema processing special, and having taken a break to come back and look at this with fresh eyes, I don't think it's needed at all. And maybe that would cut down on confusion about the nature and placement of $vocabulary. It truly is an annotation, which is further processed by the application that called the validator, which just happens to be the JSON Schema implementation itself. It may be worth a note that it's entirely reasonably to just look in a referenced meta-schema for $vocabulary in order to load features if validation against the meta-schema is turned off.

handrews avatar May 08 '21 01:05 handrews

I believe the initial restirction was to keep it simple and also not require implementations to look for anything other than $schema to process the schema.

That being said, i'm not against making such a change, as I'm pretty convinced there are many practical use cases.

IMHO, we should...

  1. Allow the use of $vocabulary in normal schemas (not limited to meta-schemas)
  2. Define that use in schemas is supplementary to that in meta-schemas, in that the results are merged
  3. Define that the "subject" schema (for lack of a better phrase. I'm sure there is one) can only additionally require previously optional vocabulary support, and cannot make required support optional. We don't want required vocabularies turned optional.

The simplest use case that comes to mind is a single schema needing the format vocabulary be required to convey the schema authors requirements.

Relequestual avatar May 10 '21 13:05 Relequestual

Note: @Relequestual suggested that I hide his comment just above this one- I'll add more here soon to clarify what the topic is that I'm trying to address. Although the off-topicness illustrates my point that $vocabulary appears more confusing than it is, so in that sense it was also on-topic 😅

handrews avatar May 10 '21 15:05 handrews

I think this ends up just being a clarification. The statement The "$vocabulary" keyword MUST be ignored in schema documents that are not being processed as a meta-schema. is not testable AFAIK (paging @karenetheridge ) so simply observing that it naturally would not have an effect is a clarification.

We cannot state that it MUST be treated as an annotation in the patch release (and that's worth further thought anyway), so I think I'll just put in a CREF noting that that is the current direction of thought, which we can formalize one way or the other in the next non-patch draft.

If anyone thinks this should not go in, simply object here or on the PR and that's enough to bump it out of the patch release.

handrews avatar Jun 01 '21 21:06 handrews

The "$vocabulary" keyword MUST be ignored in schema documents that are not being processed as a meta-schema. is not testable AFAIK

We can test for this by using an unknown $vocabulary URI in a schema and checking that validation can still proceed successfully.

karenetheridge avatar Jun 02 '21 00:06 karenetheridge

@karenetheridge yeah, that's definitely testable. Having thought about it more, I think I can work this out so that the "$vocabulary behaves mostly like an annotation" approach produces the same testable requirement. For now, it will have to be "behaves mostly like" because requiring it to actually be collected as an annotation would be a conformance change.

handrews avatar Jun 03 '21 03:06 handrews

For now, it will have to be "behaves mostly like" because requiring it to actually be collected as an annotation would be a conformance change.

Agreed.

..But when we get there, I would propose having $schema generate an annotation, rather than $vocabulary, because it's $schema that appears in schemas (vs metaschemas), and I think that might satisfy @jdesrosiers's desire for keyword-source information in validation results -- all the vocabulary information can be found at the URI indicated by that $schema annotation, and it will appear in validation results whenever the metaschema happens to be altered. (I'm only writing this here so it's not forgotten; I am not attempting to derail the conversation or attempt to change the spec at a point in time when changes are not being considered.)

karenetheridge avatar Jun 03 '21 19:06 karenetheridge

@karenetheridge $schema is the only keyword that applies to the schema that contains it, rather than to the instance. $schema says nothing about the instance at all. It is essentially a self-annotation on the schema (an annotation rather than a reference because it is not automatically followed). $vocabulary, however, applies to the instance.

Given:

  • plain JSON Instance (I)
  • Schema (S)
  • Meta-Schema (MS)

Those keywords operate as follows:

  • $schema in S annotates S with the URI for MS, which can then be applied to S if desired.
  • $vocabulary in S annotates I with the vocabulary semantics it could use... but I doesn't use them because it's plain JSON
  • $schema in MS annotates MS with the URI for whatever it's own meta-schema is (possibly itself, but maybe not)
  • $vocabulary in MS annotates S with the available vocabulary semantics, which it does use when it is in turn applied to I because S is a JSON Schema.

So $schema and $vocabulary aren't operating on the same target, so there's not really a concept of "we should annotate with one vs the other." They don't annotate the same thing.

handrews avatar Jun 06 '21 17:06 handrews

Closing in favor of #1281 where I think I did a much better job explaining this.

handrews avatar Aug 22 '22 18:08 handrews