jsonschema-rs icon indicating copy to clipboard operation
jsonschema-rs copied to clipboard

Add `context` field to `ValidationError`, to improve oneOf and anyOf validation errors

Open jpmckinney opened this issue 1 year ago • 2 comments

In the case where no subschema is valid, the ValidationError instances in python-jsonschema set a context attribute for allOf and anyOf validation errors as documented and in the code.

This context attribute contains all the validation errors from the subschemas. This is very useful, because the default error message is not sufficiently informative: "[frequently a very large JSON blob] is not valid under any of the schemas listed in the 'oneOf' keyword".

Instead, an application could report the specific errors under each of the subschemas (or, it could have some way to determine which subschema was most relevant, and only report its errors). It might then end up reporting a single, specific validation error within the subschema, which is much more actionable. For example, "'name' is not a string".

Adding this sort of context attribute (I have no opinion on naming) would allow applications to improve the error message.

jpmckinney avatar Dec 22 '24 06:12 jpmckinney

This is super useful indeed, and I would be happy to have a mechanism to improve error messages for such cases (personally I saw a lot of opaque error messages of this sort while validating Open API with jsonschema).

The biggest challenge is avoiding a performance hit by collecting all the related errors. The first idea that comes to mind is to re-validate on demand and enable this "collection" logic, so this way the users who don't need this context won't be affected. Not sure though

Stranger6667 avatar Dec 22 '24 13:12 Stranger6667

Could there be an argument passed to validator_for (and other initializers) that either:

  1. gets passed to OneOfValidator via the ctx argument, such that it uses iter_errors instead of is_valid when checking subschemas (this means some extra if-statements in the oneOf validator)
  2. gets passed to get_for_draft, such that a different oneOf validator is chosen, that uses iter_errors (no extra instructions from if-statements, but probably some code duplication)

jpmckinney avatar Dec 22 '24 18:12 jpmckinney

Hello! I recently incorporated this into our product to give users better feedback on configuration mistakes. Better oneOf support would be really helpful because we use it a lot to maintain backwards compatibility in our configuration.

In our specific case we are less concerned about performance for this because we only do the JSON schema validation if reading the config file fails. As in, we have a translation layer that succeeds or fails and only if it fails do we validate it against the schema to get a better error message.

Let me know if there is any way to help testing or otherwise support this ticket.

Thank you!

orbitz avatar Jul 13 '25 12:07 orbitz

I don't know anything about the codebase but I wonder if there is a performance concern the best option is to put it on the user such that the default error is as it currently is but the error has can be reevaluated in an expanded form of that user wants.

orbitz avatar Jul 13 '25 12:07 orbitz

Hi @orbitz

@kprzybyla submitted #772 - please, take a look to see if it would be sufficient for you. It looks really nice to me, but having more details about your use case will really help :)

Re: performance. I am fine with unconditionally putting it into the "error" branch, which should appear less often. Right now, it would be problematic to re-run evaluation on demand, but having a proper bytecode instead of an AST-like tree of nodes would allow this easily. So, I'd note this and postpone for later, when I have more bandwidth to work on the internal representation

Stranger6667 avatar Jul 21 '25 15:07 Stranger6667