json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

✨ Proposal: Add `undefinedProperties` and `undefinedItems` Keywords

Open hackowitz-af opened this issue 6 months ago • 7 comments

Describe the inspiration for your proposal

I want to enforce that "nothing goes unseen" by the schema - for example to safeguard against misspelled property names, ensure that extension properties follow a naming convention, to prevent instances from adding bespoke properties that would subvert interoperability requirements, or otherwise enforce that the instance is 100% covered by the schema.

additional* items/properties don't meet the need because our schemas are large and use $ref and/or allOf to aggregate component subschemas from a schema registry.

unevaluated* items/properties don't meet the need because because properties are considered "unevaluated" even if they are expected, required, and valid, if they are within a failing subschema. This causes very many false negative failed validations, often many layers removed from the actual invalid property/item. See also #1604, which this new keyword would resolve.

Describe the proposal

A new keyword pair for undefined* properties and items might work alongside the unevaluated* and additional* keywords.

additional* and unevaluated* keywords distinguish if a property/item successfully validates. undefined* rather distinguishes if the property/item has an applicable definition. See also #1605, which this new keyword would resolve.

A property/item can be called "defined" if it produces a validation result or annotations in the schema being evaluated or any applicable sub-schema.

  • The schema being evaluated is of course applicable to itself.
  • A subschema of the schema being evaluated is always applicable if it is required by it's applicator, whether or not the subschema or the applicator are valid.
    • For example, allOf and $ref require every subschema to be valid.
    • If if, then then is "required", else else is. The "requirement" is external to the then/else applicator.
    • No oneOf/anyOf subschema is required, because the applicator could still be valid even if the subschema were invalid, as long as another subschema were valid.
    • The if subschema is not required, since if is valid even with a falsy subschema.
    • not is the opposite of required.
  • An evaluated subschema of the schema being evaluated is applicable, even if not required by it's applicator, if it is valid.
    • All of the valid anyOf subschemas are applicable.
    • The if subschema is applicable if it's valid
    • I think, but I'm not sure, that a valid not schema is applicable. That's why not complains - it shouldn't be applicable, but it is.
    • then or else subschemas aren't evaluated if not selected by if, so they aren't applicable even if they would be valid.

Describe alternatives you've considered

A boolean flag like closed or final might have similar effect to {"undefinedProperties": false}, but consistency between undefined*, unevaluated*, and additional* items/properties makes more sense.

Additional context

Annotations are currently gathered for evaluated items/properties - I think because this is the closest existing method to determining if the annotations are applicable. This change might help to collect more applicable annotations by including annotations from any defined property/item. But this might not be desired, depending on if a failed validation is considered "not an instance" vs. "a malformed instance" of the schema.

hackowitz-af avatar Jun 06 '25 00:06 hackowitz-af

Just a clarification:

additional* and unevaluated* keywords distinguish if a property/item successfully validates.

additional* does actually work the way you expect. If a sibling properties defines the keyword at all, additionalProperties doesn't evaluate it, even if that particular property failed its validation. That the property is ignored by additionalProperties tends not to matter, though, because the entire subschema fails anyway...

The key difference that you're wanting is that "defined" behavior, like additional*, but acted through applicators, like unevaluated*.

I think this is a useful proposal. Thanks for writing it up.

gregsdennis avatar Jun 06 '25 04:06 gregsdennis

Can you share an example or two illustrating how undefinedProperties would behave differently from unevaluatedProperties? I'm hoping this proposal addresses the problem I identify in, https://github.com/json-schema-org/json-schema-spec/issues/1172#issuecomment-1062540587.

Is there any difference in the true/false validation result from unevaluatedProperties? If the difference is only in how errors end up getting reported, it may not need to be a new keyword.

jdesrosiers avatar Jun 06 '25 18:06 jdesrosiers

@jdesrosiers good point, the difference only occurs on failing instances - so this might not warrant a new keyword. That being said, when there are 100s of errors, it becomes quite frustrating.

This would address the problem from your comment. Given the schema:

allOf:
  - allOf:
    - properties:
        foo: true
    - false
unevaluatedProperties: false

The property foo is required by the composition rules of allOf, so I would consider it "defined". However, the outer allOf would fail and drop annotations, so it would not consider foo "evaluated".

undefinedProperties would therefore say foo is "defined" by the schema and not give an error if foo were present.

hackowitz-af avatar Jun 10 '25 14:06 hackowitz-af

@hackowitz-af how would undefinedProperties work in this case?

{
  "anyOf": [
    { "properties": { "foo": { "type": "integer" } } },
    { "properties": { "bar": true } }
  ],
  "undefinedProperties": false
}

{
  "foo": "string",
  "bar": "other string"
}

The instance fails /anyOf/0 but passes /anyOf/1, so the anyOf is satisfied. Does undefinedProperties consider foo to be defined?

gregsdennis avatar Jun 10 '25 20:06 gregsdennis

@gregsdennis "foo" is not defined, because anyOf does not require the failing subschema; the meaning of anyOf expects that it might not be that schema.

It's more clear with this example:

{
  "$id": "pet",
  "anyOf": [
    {"$id": "cat", "properties": {"whiskers": true}},
    {"$id": "fish", "properties": {"gills": true}}
    {"$id": "bird", "properties": {"wings": true, "beak": true}, "required": ["wings", "beak"]},
  ],
  "undefinedProperties": false
}

A pet cat {"whiskers": 8} neither has to be a fish nor is one. Fish attributes aren't applicable and aren't defined on pet cats. "whiskers" are defined, "wings" and "gills" are undefined.

A pet catfish {"whiskers": 6, "gills": ["left", "right"]} is allowed by anyOf, and is both a cat and a fish (at least the schema says so). "whiskers" and "gills" are defined, "wings" are undefined.

A pet flying fish {"gills": ["left", "right"], "wings": "basically"} is not a bird, and didn't have to be one, so it does not define "wings" or "beak", and raises an error for undefined wings.

hackowitz-af avatar Jun 12 '25 21:06 hackowitz-af

@jdesrosiers is right - this is only different from unevaluatedProperties on failing instances, so it might not be worthwhile.

In these cases, the difference is only that a property is defined if it is required (e.g. within allOf, $ref, etc.), even if it is not evaluated.

hackowitz-af avatar Jun 12 '25 21:06 hackowitz-af

We may just need to add a special requirement to unevaluated* around error messaging about duplicating messaging from other keywords. Not sure how that would work, but we can take a look.

gregsdennis avatar Jun 12 '25 21:06 gregsdennis