Enhance jsonschema.exceptions.best_match to prefer deeper errors
Given the schema below, the following instance fails with a single error:
{'schema': 'C', 'kind': {'schema': 'D', 'foo': 2}}: {'schema': 'C', 'kind': {'schema': 'D', 'foo': 2}} is not valid under any of the given schemas
While it is true that the instance does not match either definitions B or C, could the validator indicate that the instance matched until it reached /kind/foo. Otherwise, schema validation will always fail on the top level anyOf, which is not very helpful.
Perhaps this is already possible and I missed it, or there is a way to design the schema differently?
Sample instance
{
"schema": "C",
"kind": {
"schema": "D" ,
"foo": 2
}
}
Schema
{
"$schema": "http://json-schema.org/schema#",
"$ref": "#/definitions/A",
"definitions": {
"A": {
"anyOf": [
{
"$ref": "#/definitions/B"
},
{
"$ref": "#/definitions/C"
}
]
},
"B": {
"type": "object",
"properties": {
"schema": { "const": "B" },
"name": { "type" : "string" },
"value": { "type" : "number" }
},
"required": [ "schema" ],
"additionalProperties": false
},
"C": {
"type": "object",
"properties": {
"schema": { "const": "C" },
"name": { "type" : "string" },
"kind": { "$ref": "#/definitions/D" }
},
"required": [ "schema" ],
"additionalProperties": false
},
"D": {
"type": "object",
"properties": {
"schema": { "const": "D" },
"foo": { "type" : "string" }
},
"required": [ "schema" ],
"additionalProperties": false
}
}
}
Hi. For that schema design, this is essentially what best_match is for.
best_match does not seem to work in this case.
Please don't say things "don't work" in the context of an issue tracker. This is the case even if in fact things don't work (i.e. where there's an obvious bug), let alone in case they do or may.
You should include what you tried, what it did, and what you expected it to do.
Here though is what your example does:
>>> jsonschema.validate(instance, schema)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/julian/.dotfiles/.local/share/virtualenvs/jsonschema/lib/python3.8/site-packages/jsonschema/validators.py", line 934, in validate
raise error
jsonschema.exceptions.ValidationError: 2 is not of type 'string'
Failed validating 'type' in schema[1]['properties']['kind']['properties']['foo']:
{'type': 'string'}
On instance['kind']['foo']:
2
2 is not of type 'string'
Failed validating 'type' in schema[1]['properties']['kind']['properties']['foo']:
{'type': 'string'}
On instance['kind']['foo']:
2
so, not the anyOf exception you mentioned. Or via the validator API, the same:
>>> jsonschema.exceptions.best_match(jsonschema.Draft7Validator(schema).iter_errors(instance))
<ValidationError: "2 is not of type 'string'">
(Please also don't take the above as being rude. It's not meant to be).
Thanks for the feedback.
More detailed example below.
Configuration
"jsonschema": {
"hashes": [
"sha256:4e5b3cf8216f577bee9ce139cbe72eca3ea4f292ec60928ff24758ce626cd163",
"sha256:c8a85b28d377cc7737e46e2d9f2b4f44ee3c0e1deac6bf46ddefc7187d30797a"
],
"index": "pypi",
"version": "==3.2.0"
}
Schema
{
"$schema": "http://json-schema.org/schema#",
"$ref": "#/definitions/A",
"definitions": {
"A": {
"anyOf": [
{
"$ref": "#/definitions/B"
},
{
"$ref": "#/definitions/C"
}
]
},
"B": {
"type": "object",
"properties": {
"schema": { "const": "B" },
"name": { "type" : "string" },
"foo": { "type" : "number" },
"child" : { "$ref": "#/definitions/A" }
},
"required": [ "schema" ],
"additionalProperties": false
},
"C": {
"type": "object",
"properties": {
"schema": { "const": "C" },
"name": { "type" : "string" },
"foo": { "type" : "string" },
"child" : { "$ref": "#/definitions/A" }
},
"required": [ "schema" ],
"additionalProperties": false
}
}
}
Instance
{
"schema": "C",
"child" : {
"schema": "B",
"child" : {
"schema": "C",
"foo" : 2,
"child" : {
"schema": "B",
"foo" : 2
}
}
}
}
Code
import jsonschema
import json
import argparse
parser = argparse.ArgumentParser(description = 'Validate file against JSON Schema')
parser.add_argument(
'schema',
type = argparse.FileType('r'),
help = 'Path of the Schema'
)
parser.add_argument(
'file',
type = argparse.FileType('r'),
help = 'Path of the file to validate'
)
args = parser.parse_args()
schema = json.load(args.schema)
doc = json.load(args.file)
validator = jsonschema.Draft7Validator(schema)
print(jsonschema.exceptions.best_match(validator.iter_errors(doc)))
Actual output
'B' was expected
Failed validating 'const' in schema[0]['properties']['schema']:
{'const': 'B'}
On instance['schema']:
'C'
Potential output
2 is not of type 'string'
Failed validating 'type' in schema[0]['properties']['foo']:
{'type': 'string'}
On instance['foo'] at "/child/child"
I haven't gotten a chance to have a look at your last message, but in that new example it seems more likely there's something we may be able to improve (unclear; best_match is always a bit of a heuristic, and one has to be careful that making one example simpler doesn't make another more complex)
But I think from what I see it's possible best_match could potentially also decide to show errors that are deeper instead of shallower or something (it does something like that now, which is why I'd have to see why it didn't do so in that specific case)
Hi @palemieux. Since validator ofoneOf and anyOf, the error instance with longer ValidationError.path will be considered as better matches in best_match. I added one more layer of schema and Instance as follows, then got the result you want.
Schema
{
"$schema": "http://json-schema.org/schema#",
"$ref": "#/definitions/A",
"definitions": {
"A": {
"anyOf": [
{
"$ref": "#/definitions/B"
},
{
"$ref": "#/definitions/C"
}
]
},
"B": {
"type": "object",
"properties": {
"schema": {"const": "B"},
"name": {"type": "string"},
"foooo": {
"type": "object",
"properties": {
"foo": {"type": "number"}
}
},
"child": {"$ref": "#/definitions/A"}
},
"required": ["schema"],
"additionalProperties": false
},
"C": {
"type": "object",
"properties": {
"schema": {"const": "C"},
"name": {"type": "string"},
"foooo": {
"type": "object",
"properties": {
"foo": {"type": "string"}
}
},
"child": {"$ref": "#/definitions/A"}
},
"required": ["schema"],
"additionalProperties": false
}
}
}
Instance
{
"schema": "C",
"child": {
"schema": "B",
"child": {
"schema": "C",
"foooo": {
"foo": 2
},
"child": {
"schema": "B",
"foooo": {
"foo": 2
},
}
}
}
}
Result
2 is not of type 'string'
Failed validating 'type' in schema[1]['properties']['foooo']['properties']['foo']:
{'type': 'string'}
On instance['foooo']['foo']:
2
Hi @Julian. Longer ValidationError.path as the best match, I am not fully sure if best_match follows this rule. Is the longer error means a deeper error?
Longer ValidationError.path as the best match
+1, unless we can think of circumstances where this fails.
Hi, My understanding from this thread concerns the enhancement of best_match. As such I would like to provide the example of when an error is from the same "level" the object type should (imo) be used to return the best match. Maybe this should be its own thread, due to the title containing "prefer deeper errors", my underlying thinking being that when the error can't be determined just due to its level the type should be used as an indicator, I happy to create this as another issue, if this is not relevant to this thread.
Example :
import jsonschema
import json
schema = json.loads("""
{
"properties" : {
"value" : {
"anyOf" : [
{"type" : "array"},
{"type":"string", "const" : "check"}
]
}
}
}
""")
entity = json.loads("""
{"value" : "chec"}
""")
v = jsonschema.Draft7Validator(schema)
print(jsonschema.exceptions.best_match(v.iter_errors(entity)))
Returns
'chec' is not of type 'array'
Failed validating 'type' in schema[0]:
{'type': 'array'}
On instance:
'chec'
I would expect the "better" match (my opinion here) to have the same type to the one that is compared with the value, for issues at the same level. So in this case, returning the value not matching the const.
@bliiben I've definitely had that thought before (and forget what conclusion I came to) -- but let's take that to another issue (around whether best_match should prefer the same type given all else equal, as you say).
A mostly simplified version of this example is:
schema = {
"anyOf": [
{"$ref": "#/definitions/B"},
{"$ref": "#/definitions/C"}
],
"definitions": {
"B": {
"properties": {
"schema": { "const": "B" },
"child" : { "$ref": "#" }
},
},
"C": {
"properties": {
"schema": { "const": "C" },
"foo": { "type" : "string" },
"child" : { "$ref": "#" }
},
}
}
}
instance = {
"schema": "B",
"child" : {
"schema": "B",
"child" : {"schema": "C", "foo" : 2}
}
}
from pprint import pformat, pprint
from textwrap import indent
from jsonschema import Draft7Validator, exceptions
Draft7Validator.check_schema(schema)
validator = Draft7Validator(schema)
errors = list(validator.iter_errors(instance))
best = exceptions.best_match(errors)
pprint(
[
(each.message, "#/" + "/".join(each.absolute_path), "#/" + "/".join(map(str, each.absolute_schema_path)))
for each in best.context[2].context[0].context
]
)
The issue is essentially that neither error really is deeper. They're both at the same level -- specifically the above produces:
[("'B' was expected",
'#/child/child/schema',
'#/anyOf/1/properties/child/anyOf/0/properties/child/anyOf/0/properties/schema/const'),
("2 is not of type 'string'",
'#/child/child/foo',
'#/anyOf/1/properties/child/anyOf/0/properties/child/anyOf/1/properties/foo/type')]
i.e. they're at the same level both in the instance and schema.
I'm inclined to close this as much as I agree the other message would be better for this case, I can't think of a heuristic that actually would get us there. I'm open to suggestions, but what's in the title here isn't it, and we already do indeed take depth into account.