json-schema-spec
json-schema-spec copied to clipboard
๐งน Clarification: Define In-Place Applicators' Subschemas and Results
Specification section
10.2. Keywords for Applying Subschemas in Place These keywords apply subschemas to the same location in the instance as the parent schema is being applied. They allow combining or modifying the subschema results in various ways.
[10.2.*.*] [...] An instance validates successfully against this keyword if it validates successfully against [some subset of] schema[(s)] defined by this keyword's value.
What is unclear?
- It doesn't define which applicator schema(s) are applied to the parent as subschemas.
- It doesn't define which results are gathered
The spec says the applicator applies subschemas - but not all of the applicator's contained schemas are actually subschemas of the instance. Many of them are only candidate subschemas as in the oneOf applicator, explicitly not subschemas as in the not applicator, or conditional as in the if/then/else.
The spec says the applicators allow combining or modifying "subschema" results, but each applicator only defines how it determines the boolean "validity". Most of the applicators don't state anything about how they combine any other results like annotations or evaluated properties. For example, should the annotations of a valid schema with in a not applicator be gathered? What about an invalid one? How about the properties of every valid schema within a anyOf applicator be considered by unevaluatedProperties, like with annotations?
Proposal
Here, I use "results" to mean all annotations, "evaluated" items/properties per unevaluated* and additional* keywords, and possibly other results like validation errors.
The spec should clarify, either for each applicator individually or for all applicators globally:
- The applicator provides results from all subschemas - even if they are invalid.
- The applicator does not provides results from non-subschemas. It uses their results to determine it's own, not to pass through.
- All valid, evaluated, applicator schemas are subschemas.
- For example, a valid schema in a
notapplicator is a subschema and can contribute results, even though it causes the applicator to be invalid. - The
anyOfdocs already clarify that all schemas need to be evaluated when collecting full results, but it might need to be added if this should include "evaluated" items/properties, or annotations only
- For example, a valid schema in a
- Invalid or unevaluated applicator schemas might still be subschemas, depending on the applicator's defined logic.
- For example, a valid
elseschema following a validifschema, is not evaluated and does not produce results. - For example, each
allOfschema is required to a subschema, even if it it invalid, and must provide results.
- For example, a valid
This would help quite a bit for unevaluated* keywords, which seem to struggle from how to determine evaluated properties, especially from subschemas, per #1604 .
Do you think this work might require an [Architectural Decision Record (ADR)]? (significant or noteworthy)
No
I guess the most simple summary is that I think that all subschemas applied by applicators should behave consistently with the current behavior of the $ref applicator.
The Difference
It seems that in 2020-12, results are gathered from a failing #/$ref as if they were from the schema #/ itself. But results from subschemas are not.
Let's compare using a car that fails validation (if it's a car at all - since it fails to validate as one):
{"make": "Mercedes-Benz", "model": "G63 AMG 6x6", "wheels": 6}
โ My results are produced in python with jsonschema==4.23.0. If these deviate from the spec, and have caused my misinterpretation of the spec, I will migrate my issue over there.
The $ref Applicator
Let's include the car subschema using $ref. The unevaluatedProperties will shine some light on what annotations (if any) are gathered from the failing car subschema.
{
"$ref": "#/$defs/car",
"unevaluatedProperties": false,
"$defs": {
"car": {
"required": ["make", "model", "wheels"],
"properties": {
"make": {"type": "string"},
"model": {"type": "string"},
"wheels": {"type": "integer", "minimum": 2, "maximum": 4}
}
}
}
}
produces one error:
ValidationError: 6 is greater than the maximum of 4
Failed validating 'maximum' in schema['properties']['wheels']:
{'type': 'integer', 'minimum': 2, 'maximum': 4}
On instance['wheels']:
6
Note that the "make" and "model" are not unevaluated, meaning annotation results are gathered from the failing subschema when specified by the $ref applicator.
Also, the reported "path" is #/properties/wheels, not #/$ref/properties/wheels or #/$defs/car/properties/wheels, but I'm not sure what to make of that.
Other Applicators
Let's see the same example, this time using an allOf as the applicator instead of $ref:
{
"unevaluatedProperties": false,
"allOf": [
{
"required": ["make", "model", "wheels"],
"properties": {
"make": {"type": "string"},
"model": {"type": "string"},
"wheels": {"type": "integer", "minimum": 2, "maximum": 4}
}
}
]
}
It now produces two errors:
ValidationError: Unevaluated properties are not allowed ('make', 'model', 'wheels' were unexpected)
Failed validating 'unevaluatedProperties' in schema:
{'unevaluatedProperties': False,
'allOf': [{'required': ['make', 'model', 'wheels'],
'properties': {'make': {'type': 'string'},
'model': {'type': 'string'},
'wheels': {'type': 'integer',
'minimum': 2,
'maximum': 4}}}]}
On instance:
{'make': 'Mercedes-Benz', 'model': 'G63 AMG 6x6', 'wheels': 6}
ValidationError: 6 is greater than the maximum of 4
Failed validating 'maximum' in schema['allOf'][0]['properties']['wheels']:
{'type': 'integer', 'minimum': 2, 'maximum': 4}
On instance['wheels']:
6
The subschema behavior seems to differ between $ref and the other applicator, in a meaningful way! Now the "make" and "model" are unevaluated, meaning annotation results are not gathered from the failing subschema when specified by the allOf applicator.
- Is my validator not following the
$refspec by producing these results? If not... - Is it defined anywhere that
$refis a special case that behaves differently from other applicators? - Should
$refbehave differently from other applicators? If not, which should change?
First of all, thank you for providing feedback on the spec! We need it and we appreciate you taking the time.
not all of the applicator's contained schemas are actually subschemas of the instance.
That's not actually how we use the term subschema. A subschema is any schema within a schema resource. The term is independent of the instance that's being validated.
Most of the applicators don't state anything about how they combine any other results like annotations or evaluated properties.
Evaluated properties are defined as annotations. There's no difference in how how they propagate compared to annotations.
Annotations propagate the same way for every keyword. They either get dropped because the schema failed validation or they get aggregated with the annotations from all the passing schemas. This should be covered in the 7.7 Annotations section, but as I'm reading it now, it seems to be lacking.
So,
For example, should the annotations of a valid schema with in a not applicator be gathered?
Yes
What about an invalid one?
No
How about the properties of every valid schema within a
anyOfapplicator be considered byunevaluatedProperties, like with annotations?
Yes. That part is covered in the last paragraph of 7.7 about short circuiting.
The subschema behavior seems to differ between
$refand the other applicator, in a meaningful way! Now the "make" and "model" are unevaluated, meaning annotation results are not gathered from the failing subschema when specified by theallOfapplicator.
This is an issue with the validator you're using. Validators are allowed to present errors however they like, so it's not technically wrong, but I doubt it was the intention for those two examples to produce different results.
Evaluated properties are defined as annotations.
Ah - I realized this after I posted, thanks.
This is an issue with the validator you're using.
Cool, I reported the issue to them!
I do still think it would make sense to propagate annotations from failing subschemas if the subschema is a required part of the parent, e.g. properties defined in $ref would still have their title and description even if the $ref fails overall. In my examples above, vehicle {"make": "Mercedes-Benz", "model": "G63 AMG 6x6", "wheels": 6} could still have a title and description for the make and model. This has an effect for validators that display annotations, e.g. VSCode provides annotations as tooltips even if the instance fails.
I guess it comes down to what failing validation means for the instance. If it means "it isn't a car", the the car annotations aren't applicable. If it means "the car has errors (is malformed, invalid, etc.)", the car annotations are applicable and describe what the properties should be, which might differ from that they actually are. I would think the 6-wheel Mercedes is a car with too many wheels, not that it isn't a car at all.
I think I understand the proposal now. I think it makes a lot of sense for annotation collection as a whole, not just for the unevaluated keywords. But, I'd have to think about it more to decide if it's feasible to implement. What I don't want is for every keyword to have its own rules about how annotations are propagated. I'm not sure there's a generic way to know if a subschema evaluation is "required" or droppable.
I'm not sure there's a generic way to know if a subschema evaluation is "required" or droppable.
Some thinking out loud ...
If the keyword validates as true, then we know it's failing subschemas aren't required and we can drop them.
If the keyword validates as false, then ???
{
"type": "object",
"oneOf": [
{
"properties": {
"foo": { "type": "number" }
}
},
{
"properties": {
"bar": { "type": "boolean" }
}
}
],
"unevaluatedProperties": false
}
{ "foo": "42" } -- foo is evaluated
{ "foo": "42", bar: true } -- foo is unevaluated
{ "foo": "42", bar: null } -- Either foo or bar is unevaluated. Which means both foo and bar should be reported as unevaluated.
{ "foo": 42, "bar": true } -- If the intention is bar, then foo is unevaluated. If the intention is foo, then bar is unevaluated. Therefore, both should be reported as unevaluated.
That means if the keyword validates as false, then drop annotations from all subschemas.
So, I think that works as a generic algorithm. If valid, drop annotations from failing subschemas. If invalid, drop annotations from all subschemas. Can you think of any cases where this wouldn't hold up?
EDIT: I was rushing and should have held off posting this. In hind sight, this is just what we have now and doesn't help anything. I'll follow up later.
My previous comment was mostly nonsense, but I think the example is useful.
{
"type": "object",
"oneOf": [
{
"properties": {
"foo": { "type": "number" }
}
},
{
"properties": {
"bar": { "type": "boolean" }
}
}
],
"unevaluatedProperties": false
}
I think an equivalent way to express the proposal is to say that evaluated properties are never dropped. This is the problem.
{ "foo": "42", bar: true }-- Passes, but foo should be unevaluated.
That's not actually how we use the term subschema.
Oops! I'll use the term "applicable" to mean a subschema with relevant results regardless of validity. This is what I originally meant by "subschema" and "candidate subschema".
What I don't want is for every keyword to have its own rules about how annotations are propagated.
I guess I was thinking they would, depending on the logic of the applicator - similar to how applicators may combine other results per their self-defined logic. For example the $ref subschema is always applicable whether or not it's valid.
For example, I like to put a set of common property definitions in a "core" schema and use "$ref": "<$id-of-core>" in all of the schemas that use those terms. I use allOfas an array of$refs` when there are multiple definition "namespaces" I want to combine. This comes up a lot when sourcing a schema from a PDF/word doc, where terms and definitions are in one or more sections or appendices, separate from their usage. I don't know any other way to shared definitions without these keywords.
I'm not sure there's a generic way to know if a subschema evaluation is "required" or droppable.
That's a good point. I'll think aloud with you...
If the keyword validates as true, then we know it's failing subschemas aren't required and we can drop them.
Yep! That makes it easy. What about for unevaluated subschemas of passing keywords? Currently anyOf should evaluate them to collect annotations, else and then should not. This makes sense when considering the semantics of the keyword, but doesn't seem possible with keyword-agnostic annotation collection.
If the keyword validates as false, then ???
I can't think of a good way to do this without knowing the meaning of the keyword.
Can you think of any cases where this wouldn't hold up?
For failing keywords where every subschema is required, failing subschemas are still applicable and I want to keep their annotations.
For failing keywords where every subschema is allowed, but might be allowed (or required) to fail depending on the results of siblings, it's impossible to know which failing subschemas should have been applicable, or passing subschemas should not have been applicable. In your example, foo isn't required and doesn't have to be a number (as long as bar is a boolean), and bar isn't required and doesn't have to be a boolean (as long as foo is a number). I this case, I would say the passing subschemas are applicable and the failing subschemas are not - the instance is whatever the passing subschema describes, even if it should not be.
For failing keywords where each subschema is required to fail, I would say it subschemas, valid or not, are explicitly not applicable. On the other hand, to be consistent with above, we could say the subschema is applicable and the error is that is should not be.
How about subschemas are "applicable", depending on their result (rows) and requirement (columns), per the following:
| Must Fail | May Pass or Fail | Must Pass | |
|---|---|---|---|
| Passes | โ | โ | โ |
| Fails | โ | โ | โ |
| Unevaluated | โ* | โ* | โ* |
| *โ: See above about unevaluated subschemas of passing keywords |
And each applicator keyword handles subschemas per one of these columns, like:
- Must Fail:
not - May Pass or Fail:
anyOf,oneOf,if, etc. - Must Pass:
$ref,allOf,thenorelse,$dynamicRef, etc.
I'm on the fence about general rules for valid subschemas under invalid applicators. I think my general approach is that the instance is whatever the passing subschema describes, even if it the applicator declares that it should not be. That means the subschema is applicable, and annotations should be collected, even though the applicator fails.
For example, a landlord allows some kinds of animals as pets, but adds some rules for disallowed pets:
{
"anyOf": [{"$ref": "$defs/cat"}, {"$ref": "$defs/dog"}, {"$ref": "$defs/bird"}],
"not": {
"title": "A disallowed pet",
"anyOf": [
{
"title": "A large dog over 50lb.",
"$ref": "#defs/dog",
"properties": {
"weight": {"description": "A weight that qualifies the dog as 'large'", "minimum": 50}
}
}
]
}
"$defs": {
"animal": {"properties": {"weight": {"$comment": "in lbs", "type": "number", "exclusiveMinimum": 0}}, "required": ["weight"]},
"cat": {"title": "A cat", "required": ["whiskers"]},
"dog": {"title": "A dog", "required": ["tail"]},
"bird": {"title": "A bird", "required": ["wings"]}
}
}
And the instance:
{"tail": "wagging", "weight": 69}
The instance fails validation because it breaks the rule about maximum dog weight. But it is an instance of the subschema - the instance is "A large dog over 50lb", and the weight is "A weight that qualifies the dog as 'large'". The error "weight 69 is greater than 50" is also particularly useful, within the more broad error "{"anyOf": <rules>}".
I think refactoring the schema would not help, but I'm happily corrected. For example if the #/defs/dog defined a valid pet dog as less than 50lbs, the #/not/anyOf would go away but the error would become even more abstract. "Not an allowed pet" is a less helpful error than "an allowed species that is a disallowed weight".