json-schema-spec
json-schema-spec copied to clipboard
✨ Proposal: Add `undefinedProperties` and `undefinedItems` Keywords
Describe the inspiration for your proposal
I want to enforce that "nothing goes unseen" by the schema - for example to safeguard against misspelled property names, ensure that extension properties follow a naming convention, to prevent instances from adding bespoke properties that would subvert interoperability requirements, or otherwise enforce that the instance is 100% covered by the schema.
additional* items/properties don't meet the need because our schemas are large and use $ref and/or allOf to aggregate component subschemas from a schema registry.
unevaluated* items/properties don't meet the need because because properties are considered "unevaluated" even if they are expected, required, and valid, if they are within a failing subschema. This causes very many false negative failed validations, often many layers removed from the actual invalid property/item. See also #1604, which this new keyword would resolve.
Describe the proposal
A new keyword pair for undefined* properties and items might work alongside the unevaluated* and additional* keywords.
additional* and unevaluated* keywords distinguish if a property/item successfully validates. undefined* rather distinguishes if the property/item has an applicable definition. See also #1605, which this new keyword would resolve.
A property/item can be called "defined" if it produces a validation result or annotations in the schema being evaluated or any applicable sub-schema.
- The schema being evaluated is of course applicable to itself.
- A subschema of the schema being evaluated is always applicable if it is required by it's applicator, whether or not the subschema or the applicator are valid.
- For example,
allOfand$refrequire every subschema to be valid. - If
if, thenthenis "required", elseelseis. The "requirement" is external to thethen/elseapplicator. - No
oneOf/anyOfsubschema is required, because the applicator could still be valid even if the subschema were invalid, as long as another subschema were valid. - The
ifsubschema is not required, sinceifis valid even with a falsy subschema. notis the opposite of required.
- For example,
- An evaluated subschema of the schema being evaluated is applicable, even if not required by it's applicator, if it is valid.
- All of the valid
anyOfsubschemas are applicable. - The
ifsubschema is applicable if it's valid - I think, but I'm not sure, that a valid
notschema is applicable. That's whynotcomplains - it shouldn't be applicable, but it is. thenorelsesubschemas aren't evaluated if not selected byif, so they aren't applicable even if they would be valid.
- All of the valid
Describe alternatives you've considered
A boolean flag like closed or final might have similar effect to {"undefinedProperties": false}, but consistency between undefined*, unevaluated*, and additional* items/properties makes more sense.
Additional context
Annotations are currently gathered for evaluated items/properties - I think because this is the closest existing method to determining if the annotations are applicable. This change might help to collect more applicable annotations by including annotations from any defined property/item. But this might not be desired, depending on if a failed validation is considered "not an instance" vs. "a malformed instance" of the schema.
Just a clarification:
additional*andunevaluated*keywords distinguish if a property/item successfully validates.
additional* does actually work the way you expect. If a sibling properties defines the keyword at all, additionalProperties doesn't evaluate it, even if that particular property failed its validation. That the property is ignored by additionalProperties tends not to matter, though, because the entire subschema fails anyway...
The key difference that you're wanting is that "defined" behavior, like additional*, but acted through applicators, like unevaluated*.
I think this is a useful proposal. Thanks for writing it up.
Can you share an example or two illustrating how undefinedProperties would behave differently from unevaluatedProperties? I'm hoping this proposal addresses the problem I identify in, https://github.com/json-schema-org/json-schema-spec/issues/1172#issuecomment-1062540587.
Is there any difference in the true/false validation result from unevaluatedProperties? If the difference is only in how errors end up getting reported, it may not need to be a new keyword.
@jdesrosiers good point, the difference only occurs on failing instances - so this might not warrant a new keyword. That being said, when there are 100s of errors, it becomes quite frustrating.
This would address the problem from your comment. Given the schema:
allOf:
- allOf:
- properties:
foo: true
- false
unevaluatedProperties: false
The property foo is required by the composition rules of allOf, so I would consider it "defined". However, the outer allOf would fail and drop annotations, so it would not consider foo "evaluated".
undefinedProperties would therefore say foo is "defined" by the schema and not give an error if foo were present.
@hackowitz-af how would undefinedProperties work in this case?
{
"anyOf": [
{ "properties": { "foo": { "type": "integer" } } },
{ "properties": { "bar": true } }
],
"undefinedProperties": false
}
{
"foo": "string",
"bar": "other string"
}
The instance fails /anyOf/0 but passes /anyOf/1, so the anyOf is satisfied. Does undefinedProperties consider foo to be defined?
@gregsdennis "foo" is not defined, because anyOf does not require the failing subschema; the meaning of anyOf expects that it might not be that schema.
It's more clear with this example:
{
"$id": "pet",
"anyOf": [
{"$id": "cat", "properties": {"whiskers": true}},
{"$id": "fish", "properties": {"gills": true}}
{"$id": "bird", "properties": {"wings": true, "beak": true}, "required": ["wings", "beak"]},
],
"undefinedProperties": false
}
A pet cat {"whiskers": 8} neither has to be a fish nor is one. Fish attributes aren't applicable and aren't defined on pet cats. "whiskers" are defined, "wings" and "gills" are undefined.
A pet catfish {"whiskers": 6, "gills": ["left", "right"]} is allowed by anyOf, and is both a cat and a fish (at least the schema says so). "whiskers" and "gills" are defined, "wings" are undefined.
A pet flying fish {"gills": ["left", "right"], "wings": "basically"} is not a bird, and didn't have to be one, so it does not define "wings" or "beak", and raises an error for undefined wings.
@jdesrosiers is right - this is only different from unevaluatedProperties on failing instances, so it might not be worthwhile.
In these cases, the difference is only that a property is defined if it is required (e.g. within allOf, $ref, etc.), even if it is not evaluated.
We may just need to add a special requirement to unevaluated* around error messaging about duplicating messaging from other keywords. Not sure how that would work, but we can take a look.