jsonschema icon indicating copy to clipboard operation
jsonschema copied to clipboard

Enhance jsonschema.exceptions.best_match to prefer deeper errors

Open palemieux opened this issue 5 years ago • 9 comments

Given the schema below, the following instance fails with a single error:

{'schema': 'C', 'kind': {'schema': 'D', 'foo': 2}}: {'schema': 'C', 'kind': {'schema': 'D', 'foo': 2}} is not valid under any of the given schemas

While it is true that the instance does not match either definitions B or C, could the validator indicate that the instance matched until it reached /kind/foo. Otherwise, schema validation will always fail on the top level anyOf, which is not very helpful.

Perhaps this is already possible and I missed it, or there is a way to design the schema differently?

Sample instance

{
  "schema": "C",
  "kind": {
    "schema": "D" ,
    "foo": 2
  }
}

Schema

{
  "$schema": "http://json-schema.org/schema#",
  "$ref": "#/definitions/A",
  "definitions": {
    "A": {
      "anyOf": [
        {
          "$ref": "#/definitions/B"
        },
        {
          "$ref": "#/definitions/C"
        }
      ]
    },
    "B": {
      "type": "object",
      "properties": {
        "schema": { "const": "B" },
        "name": { "type" : "string" },
        "value": { "type" : "number" }
      },
      "required": [ "schema" ],
      "additionalProperties": false
    },
    "C": {
      "type": "object",
      "properties": {
        "schema": { "const": "C" },
        "name": { "type" : "string" },
        "kind": { "$ref": "#/definitions/D" }
      },
      "required": [ "schema" ],
      "additionalProperties": false
    },
    "D": {
      "type": "object",
      "properties": {
        "schema": { "const": "D" },
        "foo": { "type" : "string" }
      },
      "required": [ "schema" ],
      "additionalProperties": false
    }
  }
}

palemieux avatar Jun 26 '20 17:06 palemieux

Hi. For that schema design, this is essentially what best_match is for.

Julian avatar Jun 26 '20 17:06 Julian

best_match does not seem to work in this case.

palemieux avatar Jun 26 '20 18:06 palemieux

Please don't say things "don't work" in the context of an issue tracker. This is the case even if in fact things don't work (i.e. where there's an obvious bug), let alone in case they do or may.

You should include what you tried, what it did, and what you expected it to do.

Here though is what your example does:

>>> jsonschema.validate(instance, schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/julian/.dotfiles/.local/share/virtualenvs/jsonschema/lib/python3.8/site-packages/jsonschema/validators.py", line 934, in validate
    raise error
jsonschema.exceptions.ValidationError: 2 is not of type 'string'

Failed validating 'type' in schema[1]['properties']['kind']['properties']['foo']:
    {'type': 'string'}

On instance['kind']['foo']:
    2

2 is not of type 'string'

Failed validating 'type' in schema[1]['properties']['kind']['properties']['foo']:
    {'type': 'string'}

On instance['kind']['foo']:
    2

so, not the anyOf exception you mentioned. Or via the validator API, the same:

>>> jsonschema.exceptions.best_match(jsonschema.Draft7Validator(schema).iter_errors(instance))
<ValidationError: "2 is not of type 'string'">

(Please also don't take the above as being rude. It's not meant to be).

Julian avatar Jun 26 '20 18:06 Julian

Thanks for the feedback.

More detailed example below.

Configuration

"jsonschema": {
            "hashes": [
                "sha256:4e5b3cf8216f577bee9ce139cbe72eca3ea4f292ec60928ff24758ce626cd163",
                "sha256:c8a85b28d377cc7737e46e2d9f2b4f44ee3c0e1deac6bf46ddefc7187d30797a"
            ],
            "index": "pypi",
            "version": "==3.2.0"
        }

Schema

{
  "$schema": "http://json-schema.org/schema#",
  "$ref": "#/definitions/A",
  "definitions": {
    "A": {
      "anyOf": [
        {
          "$ref": "#/definitions/B"
        },
        {
          "$ref": "#/definitions/C"
        }
      ]
    },
    "B": {
      "type": "object",
      "properties": {
        "schema": { "const": "B" },
        "name": { "type" : "string" },
        "foo": { "type" : "number" },
        "child" :  { "$ref": "#/definitions/A" }
      },
      "required": [ "schema" ],
      "additionalProperties": false
    },
    "C": {
      "type": "object",
      "properties": {
        "schema": { "const": "C" },
        "name": { "type" : "string" },
        "foo": { "type" : "string" },
        "child" :  { "$ref": "#/definitions/A" }
      },
      "required": [ "schema" ],
      "additionalProperties": false
    }
  }
}

Instance

{
  "schema": "C",
  "child" : {
    "schema": "B",
    "child" : {
      "schema": "C",
      "foo" : 2,
      "child" : {
          "schema": "B",
          "foo" : 2
        }
    }
  }
}

Code

import jsonschema
import json
import argparse

parser = argparse.ArgumentParser(description = 'Validate file against JSON Schema')

parser.add_argument(
    'schema',
    type = argparse.FileType('r'),
    help = 'Path of the Schema'
  )

parser.add_argument(
    'file',
    type = argparse.FileType('r'),
    help = 'Path of the file to validate'
  )

args = parser.parse_args()

schema = json.load(args.schema)

doc = json.load(args.file)

validator = jsonschema.Draft7Validator(schema)

print(jsonschema.exceptions.best_match(validator.iter_errors(doc)))

Actual output

'B' was expected

Failed validating 'const' in schema[0]['properties']['schema']:
    {'const': 'B'}

On instance['schema']:
    'C'

Potential output

2 is not of type 'string'

Failed validating 'type' in schema[0]['properties']['foo']:
    {'type': 'string'}

On instance['foo'] at "/child/child"

palemieux avatar Jun 26 '20 21:06 palemieux

I haven't gotten a chance to have a look at your last message, but in that new example it seems more likely there's something we may be able to improve (unclear; best_match is always a bit of a heuristic, and one has to be careful that making one example simpler doesn't make another more complex)

But I think from what I see it's possible best_match could potentially also decide to show errors that are deeper instead of shallower or something (it does something like that now, which is why I'd have to see why it didn't do so in that specific case)

Julian avatar Jul 02 '20 13:07 Julian

Hi @palemieux. Since validator ofoneOf and anyOf, the error instance with longer ValidationError.path will be considered as better matches in best_match. I added one more layer of schema and Instance as follows, then got the result you want.

Schema

{
   "$schema": "http://json-schema.org/schema#",
   "$ref": "#/definitions/A",
   "definitions": {
       "A": {
           "anyOf": [
               {
                   "$ref": "#/definitions/B"
               },
               {
                   "$ref": "#/definitions/C"
               }
           ]
       },
       "B": {
           "type": "object",
           "properties": {
               "schema": {"const": "B"},
               "name": {"type": "string"},
               "foooo": {
                   "type": "object",
                   "properties": {
                       "foo": {"type": "number"}
                   }
                },
               "child": {"$ref": "#/definitions/A"}
           },
           "required": ["schema"],
           "additionalProperties": false
       },
       "C": {
           "type": "object",
           "properties": {
               "schema": {"const": "C"},
               "name": {"type": "string"},
               "foooo": {
                   "type": "object",
                   "properties": {
                       "foo": {"type": "string"}
                   }
                },
               "child": {"$ref": "#/definitions/A"}
           },
           "required": ["schema"],
           "additionalProperties": false
       }
   }
}

Instance

{
  "schema": "C",
  "child": {
    "schema": "B",
    "child": {
      "schema": "C",
      "foooo": {
          "foo": 2
      },
      "child": {
          "schema": "B",
          "foooo": {
            "foo": 2
          },
      }
    }
  }
}

Result

2 is not of type 'string'

Failed validating 'type' in schema[1]['properties']['foooo']['properties']['foo']:
    {'type': 'string'}

On instance['foooo']['foo']:
    2

Hi @Julian. Longer ValidationError.path as the best match, I am not fully sure if best_match follows this rule. Is the longer error means a deeper error?

willson-chen avatar Jul 06 '20 07:07 willson-chen

Longer ValidationError.path as the best match

+1, unless we can think of circumstances where this fails.

palemieux avatar Jul 06 '20 14:07 palemieux

Hi, My understanding from this thread concerns the enhancement of best_match. As such I would like to provide the example of when an error is from the same "level" the object type should (imo) be used to return the best match. Maybe this should be its own thread, due to the title containing "prefer deeper errors", my underlying thinking being that when the error can't be determined just due to its level the type should be used as an indicator, I happy to create this as another issue, if this is not relevant to this thread.

Example :

import jsonschema
import json

schema = json.loads("""
{
    "properties" : {
        "value" : {
            "anyOf" : [
                {"type" : "array"},
                {"type":"string", "const" : "check"}
            ]
        }
    }
}
""")

entity = json.loads("""
{"value" : "chec"}
""")

v = jsonschema.Draft7Validator(schema)

print(jsonschema.exceptions.best_match(v.iter_errors(entity)))

Returns

'chec' is not of type 'array'

Failed validating 'type' in schema[0]:
    {'type': 'array'}

On instance:
    'chec'

I would expect the "better" match (my opinion here) to have the same type to the one that is compared with the value, for issues at the same level. So in this case, returning the value not matching the const.

bliiben avatar Jul 29 '20 09:07 bliiben

@bliiben I've definitely had that thought before (and forget what conclusion I came to) -- but let's take that to another issue (around whether best_match should prefer the same type given all else equal, as you say).

Julian avatar Aug 18 '20 17:08 Julian

A mostly simplified version of this example is:

schema = {
    "anyOf": [
        {"$ref": "#/definitions/B"},
        {"$ref": "#/definitions/C"}
    ],
    "definitions": {
        "B": {
            "properties": {
                "schema": { "const": "B" },
                "child" :  { "$ref": "#" }
            },
        },
        "C": {
            "properties": {
                "schema": { "const": "C" },
                "foo": { "type" : "string" },
                "child" :  { "$ref": "#" }
            },
        }
    }
}

instance = {
    "schema": "B",
    "child" : {
        "schema": "B",
        "child" : {"schema": "C", "foo" : 2}
    }
}

from pprint import pformat, pprint
from textwrap import indent
from jsonschema import Draft7Validator, exceptions

Draft7Validator.check_schema(schema)
validator = Draft7Validator(schema)

errors = list(validator.iter_errors(instance))
best = exceptions.best_match(errors)

pprint(
    [
        (each.message, "#/" + "/".join(each.absolute_path), "#/" + "/".join(map(str, each.absolute_schema_path)))
        for each in best.context[2].context[0].context
    ]
)

The issue is essentially that neither error really is deeper. They're both at the same level -- specifically the above produces:

[("'B' was expected",
  '#/child/child/schema',
  '#/anyOf/1/properties/child/anyOf/0/properties/child/anyOf/0/properties/schema/const'),
 ("2 is not of type 'string'",
  '#/child/child/foo',
  '#/anyOf/1/properties/child/anyOf/0/properties/child/anyOf/1/properties/foo/type')]

i.e. they're at the same level both in the instance and schema.

I'm inclined to close this as much as I agree the other message would be better for this case, I can't think of a heuristic that actually would get us there. I'm open to suggestions, but what's in the title here isn't it, and we already do indeed take depth into account.

Julian avatar Oct 31 '23 21:10 Julian