jsonschema
jsonschema copied to clipboard
RefResolver.resolve_fragment() breaks nested relative references
First off, thanks for this great package! Depending on the outcome of this issue, I'm hoping to contribute back in a follow-up.
I am trying to validate the following JSON, available at http://127.0.0.1:5000/about/json/schema
. Note: this JSON is itself a JSON Schema, and it contains the schema with which to validate itself (#/definitions/response/schema
). This may be a little confusing.
{
"definitions":{
"request":{
},
"response":{
"schema":{
"oneOf":[
{
"type":"object",
"properties":{
"errors":{
"type":"array",
"items":{
"$ref":"#/definitions/data/error"
}
}
},
"required":[
"errors"
]
},
{
"$ref":"http://127.0.0.1:5000/about/json/external-schema/aHR0cDovL2pzb24tc2NoZW1hLm9yZy9kcmFmdC0wNC9zY2hlbWE%3D",
"description":"A JSON Schema."
}
]
},
"openapi":{
"$ref":"http://127.0.0.1:5000/about/json/external-schema/aHR0cDovL3N3YWdnZXIuaW8vdjIvc2NoZW1hLmpzb24%3D"
}
},
"data":{
"error":{
"type":"object",
"properties":{
"code":{
"type":"string"
},
"title":{
"type":"string"
}
},
"required":[
"code",
"title"
]
}
}
},
"$schema":"http://127.0.0.1:5000/about/json/schema#/definitions/response/schema"
}
I am performing this validation using the following Python code:
def validate(self, data, schema: Optional[Dict] = None):
reference_resolver = RefResolver('', {})
if schema is None:
message = 'The JSON must be an object with a "schema" key.'
if not isinstance(data, dict):
raise ValueError('The JSON is not an object: %s' % message)
if '$schema' not in data:
raise KeyError('No "$schema" key found: %s' % message)
_, schema = reference_resolver.resolve(data['$schema'])
assert schema is not None
validate(data, schema)
RefResolver.resolve()
returns the following resolved schema. As you can see, the contained references remain intact, but their targets are no longer part of the resulting document (below), so when validating, the reference cannot be resolved, and validation fails.
{
"oneOf":[
{
"type":"object",
"properties":{
"errors":{
"type":"array",
"items":{
"$ref":"#/definitions/data/error"
}
}
},
"required":[
"errors"
]
},
{
"$ref":"http://127.0.0.1:5000/about/json/external-schema/aHR0cDovL2pzb24tc2NoZW1hLm9yZy9kcmFmdC0wNC9zY2hlbWE%3D",
"description":"A JSON Schema."
}
]
}
I am not sure if the problem lies with my use of references, or if this is simply something for which no Python support has been added. I also went through the other issues about references and don't think I found a duplicate, but I'd love to know for sure if there is another thread with more information on this feature, if I am not accidentally doing something wrong myself :)
Currently, the RefResolver doesn't handle $id
keywords during reference resolution, or support dereferencing through $ref
keywords. In theory this should be simple to solve:
- Upon encountering
$id
keywords in the document, push it to the current resolution scope, and undo upon resolution success. With the current API, callingwith resolver.resolving(ref):
would then push the$ref
scope to the context during subsequent operations following. - When indexing a JSON document by some index / key, look for
$ref
keywords, and follow them, effectively replacing the current resolution document and splicing the ref path with the contents of "$ref".
In practice this will probably need some cleaning up as the existing design is quite simple (and tries to separate parts of ref resolution that perhaps belong together)
@Julian @agoose77 @bartfeenstra Do you all have any fresh thoughts on this? We are using this with our generated schema that has definitions with$ref
s to other definitions that is causing the same break. We are able to remedy by inlining those $ref
s but this makes schemas unnecessarily large and difficult to read. I'm willing to take a stab at adding the functionality to resolve arbitrarily nested $ref
s but wanted to see if you all had ideas first.
@Julian This seems to be fixed on master branch...when are you planning on cutting a release for 3.0.0?
nvm I'm dumb, could just grab pre-release! Great work!! Love jsonSchema ❤️
I could be wrong, but from glancing over the implementation, nothing has changed w.r.t ref handling & id scoping to fix this yet (as far as I can see).
Nothing should have changed in this area yeah, so if this is working now, would definitely like to hear it.
In general for these sorts of things I need to stress that it'd be super helpful for the reporter to work on minimizing their example.
There's a lot here that could be left out while still presenting the underlying issue. (I know it's been a long while since this was filed, so apologies for not giving that feedback sooner, but every additional unnecessary detail means more time I or someone else investigating would need to spend, which makes it less likely I'll actually have time to do so :)
@Julian makes sense, I'll see if I can simplify and anonymize our schema and JSON payload into something very concise that fails on 2.6.0 and works on 3.0.0
Here's an example of what I believe is the same issue.
instance:
{
"multi_address": [
{
"address1": "123 Main",
"city": "foo",
"state": "AK",
"zipcode": "12345"
}
],
"single_address": {
"address1": "123 Main",
"city": "foo",
"state": "AK",
"zipcode": "12345"
}
}
schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/schema.json",
"type": "object",
"additionalProperties": false,
"definitions": {
"address": {
"$id": "/definitions/address",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"address1": { "type": "string" },
"address2": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string", "$ref": "#/definitions/state" },
"zipcode": { "type": "string" }
},
"required": ["address1", "city", "state", "zipcode"]
},
"state": {
"$id": "/definitions/state",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "string",
"enum": [
"AK",
"AL",
"AR",
"AS",
"AZ",
"WY"
]
}
},
"properties": {
"multi_address": {
"type": "array",
"items": { "$ref": "#/definitions/address" }
},
"single_address": { "$ref": "#/definitions/address" }
}
}
code:
#!/usr/bin/env python
import json
import sys
import pprint
from jsonschema.validators import validator_for
from jsonschema import FormatChecker, RefResolver
schema_file = sys.argv[1]
instance_file = sys.argv[2]
with open(schema_file) as fh:
schema = json.load(fh)
with open(instance_file) as fh:
instance = json.load(fh)
schema_store = {
schema["$id"]: schema,
}
resolver = RefResolver.from_schema(schema, store=schema_store)
validator = validator_for(schema)(schema, format_checker=FormatChecker(), resolver=resolver)
validator.validate(instance)
errors = []
for err in validator.iter_errors(instance=instance):
errors.append(err)
pprint.pprint(errors)
which raises:
jsonschema.exceptions.RefResolutionError: Unresolvable JSON pointer: 'definitions/state'
same error using the jsonschema
cli:
$ jsonschema schema.json -i example.json
UPDATE:
Dereferencing the schema manually before passing it in to the validator succeeds:
#!/usr/bin/env python
import json
import sys
import pprint
import jsonref
from jsonschema.validators import validator_for
from jsonschema import FormatChecker
schema_file = sys.argv[1]
instance_file = sys.argv[2]
with open(schema_file) as fh:
schema = json.load(fh)
with open(instance_file) as fh:
instance = json.load(fh)
deref_schema = jsonref.loads(json.dumps(schema))
validator = validator_for(schema)(deref_schema, format_checker=FormatChecker())
validator.validate(instance)
errors = []
for err in validator.iter_errors(instance=instance):
errors.append(err)
pprint.pprint(errors)
I'm (slowly) trying to help minimize this and similar examples in issues. I'll get to the original example (hopefully in a few moments), but @pkarman, your example isn't a bug in jsonschema
, your schema has a broken pointer. Specifically, you have (simplified):
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"definitions": {
"address": {
"$id": "/definitions/address",
"properties": {"state": { "$ref": "#/definitions/state" }}
},
"state": {"$id": "/definitions/state"}
},
"properties": {"address": { "$ref": "#/definitions/address" }}
}
where address
has an $id
(i.e. is a separate document), and within it you have #/definitions/state
, but address
has no definitions
, it's in the surrounding document so you cannot use a pointer with #
to refer to it.
Side note: you can confirm what other implementations besides this one do with your schemas using a new tool I wrote called bowtie, with intro post here -- in this case, running:
bowtie validate -i python-jsonschema <(echo '{"$schema": "https://json-schema.org/draft/2020-12/schema", "definitions": {"address": {"$id": "/definitions/address", "properties": {"state": { "$ref": "#/definitions/state" }}}, "state": {"$id": "/definitions/state"}}, "properties": {"address": { "$ref": "#/definitions/address" }}}') <(echo '{"address": {"state": "AK"}}')`
and substituting python-jsonschema
for other implementations like js-hyperjump
shows mostly the same behavior as this one, though interestingly a few don't match and do something I haven't yet investigated with the broken ref.
Hello there! Thanks a lot again for the kind words.
This, along with many many other $ref
-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.
The next release of jsonschema
(v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.
It looks from my testing like indeed this specific example works there! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.
I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!
Here's a quick pass at modifying what you're doing for that branch/the future release, in case it helps:
data = "<your big thing>"
from referencing import Registry, Resource
from referencing.jsonschema import DRAFT7
import jsonschema
resource = DRAFT7.create_resource(data)
registry = Registry().with_resources(
[
("http://127.0.0.1:5000/about/json/schema", resource),
("http://127.0.0.1:5000/about/json/external-schema/aHR0cDovL2pzb24tc2NoZW1hLm9yZy9kcmFmdC0wNC9zY2hlbWE%3D", DRAFT7.create_resource({})),
("http://127.0.0.1:5000/about/json/external-schema/aHR0cDovL3N3YWdnZXIuaW8vdjIvc2NoZW1hLmpzb24%3D", DRAFT7.create_resource({})),
],
)
schema = registry.resolver().lookup(data['$schema']).contents
assert schema is not None
jsonschema.validate(schema=schema, instance=data, registry=registry)