Recursive self-referenced definition does not work locally
We have an all-in-one schema file, within which the property subOrganizationOf is referencing to its parent organization by "$ref": "#". But when validators.py hits this $ref, it tries to follow the url in the id and fetch external url https://project-open-data.cio.gov/v1.1/schema/organization.json. If the url is out-dated, it raise RefResolutionError(exc).
We are expecting validators stays with local definition, not relies on external url to validate this recursive self-referenced definition.
sample.json:
{
"organization": {
"name": "Data.gov",
"subOrganizationOf": {
"name": "Technology Transformation Service"
}
}
}
sample.schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"organization": {
"id": "https://project-open-data.cio.gov/v1.1/schema/organization.json#",
"required": [
"name"
],
"properties": {
"name": {
"type": "string"
},
"subOrganizationOf": {
"$ref": "#"
}
}
}
},
"properties": {
"organization": {"$ref": "#/definitions/organization"}
}
}
Install jsonschema v3.2.0 and run the following command and this is what we see.
$ jsonschema -i sample.json sample.schema
- It is validated but jsonschema actually makes an (unnecessary?) external request to the URI string 'https://project-open-data.cio.gov/v1.1/schema/organization.json#'.
- If we change the URI string slightly, it gives error
jsonschema.exceptions.RefResolutionError: HTTP Error 404: Not Found - We can remove the
idfield from the schema, but then thenameproperty requirement is not being validated any more.
We expect the jsonschema treats the id field as unique identifier, not follows its URI to fetch definition remotely, since the definition is in the local schema already.
@FuhuXia The PR #717 could solve your problem.
@willson-chen I installed v3.2.0 and cherry-picked the three commits from the PR #717. It still has same issue.
Thanks for your feedback, I have passed the test on the current latest master branch, put sample.json and sample.schema in the jsonschema root directory and then executed jsonschema -i sample.json sample.schema, but no errors were returned.
@FuhuXia Sorry, I ignored to invalidate the ID. Using the test case below, I am sure it can be verified normally. But the command line mode is indeed still invalid. Next I will look at the code on the command line, thank you again for your feedback.
from unittest import TestCase
from jsonschema.validators import validate
class Test710(TestCase):
def test_710(self):
data = {
"organization": {
"name": "Data.gov",
"subOrganizationOf": {
"name": "Technology Transformation Service"
}
}
}
schema = {
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"organization": {
"id": "https://project-open-data.cio.gov/v1.1/schema/organization11111.json#",
"required": [
"name"
],
"properties": {
"name": {
"type": "string"
},
"subOrganizationOf": {
"$ref": "#"
}
}
}
},
"properties": {
"organization": {"$ref": "#/definitions/organization"}
}
}
validate(data, schema)
@willson-chen Thank you for looking into it.
I always use latest release tagged v3.2.0 in my code, because I found current master branch has its own issue. For example, in my sample.json, if I change "name": "Technology Transformation Service" to "name123": "Technology Transformation Service", v3.2.0 will complain missing required name property, but master branch will overlook it.
v3.2.0 will complain missing required
nameproperty, but master branch will overlook it.
@FuhuXia Yes, I also noticed this. It may take some time to look.
After I installed v3.2.0 and cherry-picked the three commits from the PR #717, then executed the command python -m jsonschema -i sample.json sample.schema, recursive self-referenced definition worked well. This time I invalidated the "id": "https://project-open-data.cio.gov/v1.1/schema/organization111.json#"
For your reference.
@Julian I'm sure there must be a bug in this line: https://github.com/Julian/jsonschema/blob/756de12c69de8416469041a943ce9861b8e88141/jsonschema/cli.py#L195
The type of arguments["schema"] is str, When this type of str parameter is passed to validator_for will always return Draft7Validator, so draft-03, draft-04 and draft-06 specified by $schema will cannot be fully verified.
When modified in the following way, the test_cli.py test will fail:Failure: builtins.tuple: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x0000000003941880>)
def _json_file(path):
with open(path) as file:
return json.load(file)
arguments["validator"] = validator_for(arguments["schema"])-->
arguments["validator"] = validator_for(_json_file(arguments["schema"]))
@Julian Can you give some suggestions for modification that is compatible with the test_cli.py test?
(Note to self):
This seems to still occur, a minimal-er reproducer is:
from jsonschema.validators import Draft4Validator, RefResolver
schema = {
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"organization": {
"id": "https://project-open-data.cio.gov/v1.1/schema/organization.json#",
"properties": {
"subOrganizationOf": {"$ref": "#"}
}
}
},
"properties": {"organization": {"$ref": "#/definitions/organization"}}
}
instance = {"organization": {"subOrganizationOf": {}}}
resolver = RefResolver.from_schema(schema)
resolver.resolve_remote = lambda *args, **kwargs: breakpoint()
Draft4Validator(schema, resolver=resolver).validate(instance)
where we still hit resolve_remote even though the id we need is present.
Probably fixing this is dependent on having an API for flagging $defs / definitions (here in draft 4) and applicators as containing subschemas (and then discovering id / $ids within them).
Hello there!
This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.
The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.
It looks from my testing like indeed this specific example works there! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.
I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!