jsonschema icon indicating copy to clipboard operation
jsonschema copied to clipboard

Mitigate undesired side effect of new `best_match` behaviour with alternative proposal

Open ilia1243 opened this issue 1 year ago • 4 comments

In continuation to #1250. The fix has seemingly undesired side effect.

from jsonschema import Draft202012Validator as Validator, exceptions

schema = {'oneOf': [
    {'properties': {'run': {'type': 'string'}}, 'required': ['run']},
    {'properties': {'uses': {'type': 'string'}}, 'required': ['uses']},
]}
instance = {'uses': 1, 'run': 1}

error = exceptions.best_match(Validator(schema).iter_errors(instance))
print(schema, "\n\n", error)

After the fix it produces:

 1 is not of type 'string'
On instance['run']:
    1

Conceptually, it is not clear why run has priority (I understand that technically it has priority due to the alphabetical order).

Based on real example https://raw.githubusercontent.com/SchemaStore/schemastore/master/src/schemas/json/github-workflow.json

As an alternative proposal, the relevance function could not be changed, but best_match could distinguish errors having different error.path in the same subscheme, and choose first of them (maybe having even not the minimal relevance, but the maximum one for this particular subscheme provided that it is still minimal among different subschemes).

ilia1243 avatar May 15 '24 12:05 ilia1243

Opened pull request with possible improvement of best_match.

ilia1243 avatar May 19 '24 19:05 ilia1243

Hi, I think I am running into the same thing, but I wanted to share my example just in case.

from jsonschema import validate
from jsonschema.exceptions import ValidationError

schema = {
    "type": "object",
    "properties": {
        "tuple_or_tuple_array": {
            "anyOf": [
                {
                    "type": "array",
                    "items": {
                        "$ref": "#/$defs/int_tuple",
                    },
                },
                {
                    "$ref": "#/$defs/int_tuple",
                },
            ]
        },
    },
    "$defs": {
        "int_tuple": {
            "type": "array",
            "items": {
                "type": "integer",
            },
            "minItems": 2,
            "maxItems": 2,
        },
    },
}


def try_validate(instance):
    try:
        validate(instance, schema)
    except ValidationError as e:
        print(str(e))
    else:
        print("No error.\n")
    finally:
        print("-" * 80)


# Case 1
control = {"tuple_or_tuple_array": [0, 0]}
try_validate(control)

# Case 2
control2 = {"tuple_or_tuple_array": [[0, 0], [0, 0]]}
try_validate(control2)

# Case 3
misleading_error = {"tuple_or_tuple_array": [0, "not an int"]}
try_validate(misleading_error)

# Case 4
expected_error = {"tuple_or_tuple_array": ["not an int", 0]}
try_validate(expected_error)

Below is the output with jsonschema == 4.22.0.

No error.

--------------------------------------------------------------------------------
No error.

--------------------------------------------------------------------------------
0 is not of type 'array'

Failed validating 'type' in schema[0]['items']:
    {'items': {'type': 'integer'},
     'maxItems': 2,
     'minItems': 2,
     'type': 'array'}

On instance[0]:
    0
--------------------------------------------------------------------------------
['not an int', 0] is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['tuple_or_tuple_array']:
    {'anyOf': [{'items': {'$ref': '#/$defs/int_tuple'}, 'type': 'array'},
               {'$ref': '#/$defs/int_tuple'}]}

On instance['tuple_or_tuple_array']:
    ['not an int', 0]
--------------------------------------------------------------------------------

The 3rd case here has a misleading error message. As shown in case 4, if the values in the tuple are swapped, the error message is more accurate.

nfriedl1 avatar Jun 13 '24 23:06 nfriedl1

A helpful thing to check would be if your case is solved by @ilia1243's fix in #1258 (which I am still unfortunately behind on looking into). But it would be very helpful if you checked whether you are more satisfied with the error after that change.

Julian avatar Jun 14 '24 15:06 Julian

The mentioned fix does solve the problem by @nfriedl1. It shows in 3 and 4 cases:

[0, 'not an int'] is not valid under any of the given schemas

and

['not an int', 0] is not valid under any of the given schemas

ilia1243 avatar Jun 20 '24 07:06 ilia1243