jsonschema
jsonschema copied to clipboard
Designating a dialect for custom metaschemas in 4.18
I'm working with a custom metaschema that is a superset of draft 4, is there/will there be a way to select the draft 4 dialect when creating a Validator class? Currently the create method is choosing the opaque dialect when it fails to recognize our metaschema's id:
https://github.com/python-jsonschema/jsonschema/blob/v4.18.0a1/jsonschema/validators.py#L187
and the referencing package doesn't appear to offer a sanctioned method for registering a new dialect id:
https://github.com/python-jsonschema/referencing/blob/v0.24.4/referencing/jsonschema.py#L544-L556
Is there another way to accomplish this that I'm missing? And if not, are you open to adding something like a default_specification argument to the create method?
By the way, thanks a lot for providing an alpha release, it's super helpful to be able to work through our issues ahead of time.
This will happen before the real release, probably this week! Thanks for indicating someone is paying attention :D
Expect an update in the next few days but it'll look basically like what you expect I hope!
(to be even more specific no changes should be required on your part though you certainly can choose to make some to get improved behavior, and also there will be a be another beta!)
Huzzah! I'll keep an eye out for it.
I want to clarify something that I didn't notice until now --
@eslavich are you specifically calling jsonschema.validators.create and not extend? I assumed (or misread) that you meant the latter (and were surprised that extending a validator didn't preserve its resolving behavior, which is why I labelled this a bug).
Are you instead talking about a totally unrelated validator/dialect you created with .create which you happened to define a keyword called $ref for?
I've been testing asdf with the new 4.18 and I think I might have a minimal example that illustrates why we need to define a specification (and the opaque specification results in resolution errors).
For asdf we define a meta-schema based off draft4 (the details probably aren't important but I'm happy to supply as much as you'd like). Since this metaschema is not registered with referencing.jsonschema._SPECIFICATIONS creating a validator with the metaschema results in an opaque specification and failures due to inability to resolve references.
Here's a minimal example that was compatible with 4.17:
import jsonschema
meta_schema = {
"id": "https://example.com/yaml-schema/draft-01",
"$schema": "http://json-schema.org/draft-04/schema#",
"allOf": [{"$ref": "http://json-schema.org/draft-04/schema"}],
}
s0 = {
"id": "http://example.com/foo",
"$schema": "http://example.com/yaml-schema/draft-01#",
}
s1 = {
"id": "http://example.com/bar",
"$schema": "http://example.com/yaml-schema/draft-01#",
"allOf": [{"$ref": "foo"}]
}
by_id = {s['id']: s for s in (meta_schema, s0, s1)}
def retrieve(uri):
return by_id[uri]
handlers = {'http': retrieve}
resolver = jsonschema.validators.RefResolver(
"", {}, cache_remote=False, handlers=handlers)
Validator = jsonschema.validators.create(
meta_schema=meta_schema,
type_checker=jsonschema.validators.Draft4Validator.TYPE_CHECKER,
validators=jsonschema.validators.Draft4Validator.VALIDATORS,
id_of=jsonschema.validators.Draft4Validator.ID_OF,
format_checker=jsonschema.validators.Draft4Validator.FORMAT_CHECKER,
)
validator = Validator(s1, resolver=resolver)
validator.validate({})
When run with 4.17.3 this executes with no error. When run with 4.18.0 this shows the expected DeprecationWarning for RefResolver and errors out as follows:
/Users/bgraham/projects/230314_jsonschema_ref_resolver/tests/ref_resolution/03_ref.py:28: DeprecationWarning: jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.
resolver = jsonschema.validators.RefResolver(
Traceback (most recent call last):
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1082, in resolve_from_url
document = self.store[url]
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_utils.py", line 20, in __getitem__
return self.store[self.normalize(uri)]
KeyError: 'foo'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1085, in resolve_from_url
document = self.resolve_remote(url)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1189, in resolve_remote
with urlopen(uri) as url:
File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 503, in open
req = Request(fullurl, data)
File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 322, in __init__
self.full_url = url
File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 348, in full_url
self._parse()
File "/Users/bgraham/.pyenv/versions/3.10.6/lib/python3.10/urllib/request.py", line 377, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'foo'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/bgraham/projects/230314_jsonschema_ref_resolver/tests/ref_resolution/03_ref.py", line 38, in <module>
validator.validate({})
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 420, in validate
for error in self.iter_errors(*args, **kwargs):
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 354, in iter_errors
for error in errors:
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 335, in allOf
yield from validator.descend(instance, subschema, schema_path=index)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 402, in descend
for error in errors:
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 284, in ref
yield from validator._validate_reference(ref=ref, instance=instance)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 447, in _validate_reference
scope, resolved = resolve(ref)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1071, in resolve
return url, self._remote_cache(url)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 1087, in resolve_from_url
raise exceptions._RefResolutionError(exc)
jsonschema.exceptions._RefResolutionError: unknown url type: 'foo'
If I modify the example to use referencing (am I doing this right?):
import jsonschema
import referencing
meta_schema = {
"id": "https://example.com/yaml-schema/draft-01",
"$schema": "http://json-schema.org/draft-04/schema#",
"allOf": [{"$ref": "http://json-schema.org/draft-04/schema"}],
}
s0 = {
"id": "http://example.com/foo",
"$schema": "http://example.com/yaml-schema/draft-01#",
}
s1 = {
"id": "http://example.com/bar",
"$schema": "http://example.com/yaml-schema/draft-01#",
"allOf": [{"$ref": "foo"}]
}
by_id = {s['id']: s for s in (meta_schema, s0, s1)}
def retrieve(uri):
if uri in by_id:
return referencing.Resource(by_id[uri], referencing.jsonschema.DRAFT4)
raise referencing.exceptions.NoSuchResource(uri)
registry = referencing.Registry(retrieve=retrieve)
Validator = jsonschema.validators.create(
meta_schema=meta_schema,
type_checker=jsonschema.validators.Draft4Validator.TYPE_CHECKER,
validators=jsonschema.validators.Draft4Validator.VALIDATORS,
id_of=jsonschema.validators.Draft4Validator.ID_OF,
format_checker=jsonschema.validators.Draft4Validator.FORMAT_CHECKER,
)
validator = Validator(s1, registry=registry)
validator.validate({})
The example fails with the following traceback:
Traceback (most recent call last):
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 432, in _validate_reference
resolved = self._resolver.lookup(ref)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/referencing/_core.py", line 588, in lookup
raise exceptions.Unresolvable(ref=ref) from None
referencing.exceptions.Unresolvable: foo
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/bgraham/projects/230314_jsonschema_ref_resolver/tests/ref_resolution/02_ref.py", line 39, in <module>
validator.validate({})
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 420, in validate
for error in self.iter_errors(*args, **kwargs):
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 354, in iter_errors
for error in errors:
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 335, in allOf
yield from validator.descend(instance, subschema, schema_path=index)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 402, in descend
for error in errors:
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/_validators.py", line 284, in ref
yield from validator._validate_reference(ref=ref, instance=instance)
File "/Users/bgraham/.pyenv/versions/jsonschema_ref_resolver/lib/python3.10/site-packages/jsonschema/validators.py", line 434, in _validate_reference
raise exceptions._WrappedReferencingError(err)
jsonschema.exceptions._WrappedReferencingError: Unresolvable: foo
Thanks, that's definitely helpful, I'll have a look more carefully in the morning, but just to be sure, why are you calling create and not extend there if you're simply trying to add some stuff to draft4?
Thanks for the quick response and for the good question. I have not tried swapping 'create' for 'extend'. A simple swap (and removing the id_of and meta_schema arguments) in asdf does not appear to work but I'm not quite sure why yet.
Is there a way to define a meta schema with extend?
Is there a way to define a meta schema with extend?
When I first added the API I mistakenly didn't add one, assuming that generally one wasn't going to change the metaschema -- if this is what's preventing you from using it I'm happy to add an argument for it, though otherwise I wasn't planning on it because eventually the entire API may need deprecating unfortunately due to the new "Vocabulary System" in newer drafts of JSON Schema (which mean that now there's some concept of groups of validators). But yeah if it's useful I can add it if it turns out there's some other reason your quick experiment didn't work.
Initially what you shared looks like a bug (at least inasmuch as the behavior should not change for RefResolver certainly) but will need to do some more diagnosis. Thanks again for the feedback, I definitely do want to make this work in a way that requires no hacks for you guys and is indisputably better than before.
I'm currently using the suggestion in https://github.com/python-jsonschema/jsonschema/issues/994#issuecomment-1295924746
I don't think that's the same question, though I'm not 100% sure.
The discussion there is about users (perhaps like yourself) who are trying to change what it means to be "Draft 4".
This question here is about an explicit new draft which extends another -- i.e. a user who is properly specifying a different $schema URI -- I could be wrong though of course.
@braingram this should be at least partially addressed with a bugfix in v4.18.1 (out in a few minutes).
Can you let me know how much progress that gets you?
Appreciated!
EDIT: To be clear, "this" is your example comment more so than the title of the issue (being able to create referencing.Specification objects and pass them in).
@Julian Thanks for the update!
I pulled down 4.18.1 and tested the two examples.
The one using the deprecated RefResolver now works on 4.18.1.
However the second example using referencing.Registry fails with the same error. Is this expected because the Resource is created with the Draft4 specification?
However the second example using referencing.Registry fails with the same error. Is this expected because the Resource is created with the Draft4 specification?
It has more to do with the original title of this issue, which I'm not sure yet how the best way to solve is (or well, I know a good way, but it involves touching this API even though it's likely not to be a good long term solution for other reasons, so I'm not sure yet whether there's some other one).
Specifically to explain the issue --
You have the schema
{
"id": "http://example.com/bar",
"$schema": "http://example.com/yaml-schema/draft-01#",
"allOf": [{"$ref": "foo"}]
}
That $schema is saying "I am a schema that belongs to some version of JSON Schema identified by that URI http://example.com/yaml-schema/draft-01#", which of course you have invented.
jsonschema (the library) does not know what the referencing semantics are meant to be in your invented specification -- you of course want them to be "draft 4 semantics, probably with some additional keywords or something (otherwise you'd just use draft 4" -- but there's no way for the library to know that, it needs to be told that's the behavior you mean to have. So the example is failing quite simply because your schema's id keyword is saying "my ID is http://example.com/bar" but the library doesn't know that your specification uses the id keyword to represent schema identifiers.
One way of doing that would be, as you said, to have jsonschema.validators.create(...) take a specification argument and you'd need to provide a referencing.Specification object which implements the behavior you want (possibly by simply using referencing.jsonschema.DRAFT4 if you literally wanted exactly the draft 4 behavior with no changes).
But as I say adding that argument is not completely trivial -- if only because in the current API, you can provide an id_of function for specifying how schemas identify themselves, which it turned out is not enough information to know how a version of JSON Schema defines its referencing semantics -- in particular, though this gets complicated to explain, but part of what makes referencing so tedious is that each version defines which keywords contain subresources. So it's not enough to know id_of(yourversion) is "http://example.com/bar" -- the library also needs to know which keywords in your version may contain schemas, because those schemas may themselves contain subresources.
And now basically that means that the id_of argument needs deprecating, because referencing.Specification is really what's needed to define all of this.
So, tl;dr... yes I'm aware the second example doesn't work yet. If that's what's blocking you, I will consider adding the specification argument to jsonschema.validators.create, but I want to both think a bit harder on whether there's another possible solution, as well as have to carefully add that in a way that doesn't make it really easy to define a conflicting referencing.Specification and id_of argument.