jsonschema icon indicating copy to clipboard operation
jsonschema copied to clipboard

Validate schema with relative paths and within-document references

Open tillahoffmann opened this issue 7 years ago • 15 comments

I would like to use a schema base.json that defines basic properties together with a schema derived.json which adds further validation to the base schema. The derived schema needs to reference the base schema using a relative path as discussed in #98. However, I cannot get both the relative reference and within-document references to work at the same time. Here's a minimum example to illustrate the problem.

base.json

{
    "definitions": {
        "string_alias": {
            "type": "string"
        }
    },
    "properties": {
        "my_string": {
            "$ref": "#/definitions/string_alias"
        }
    }
}

derived.json

{
    "allOf": [
        {
            "$ref": "base.json"
        },
        {
            "properties": {
                "my_number": {
                    "type": "number"
                }
            }
        }
    ]
}

data.json

{
    "my_string": "Hello world!",
    "my_number": 42
}

validate.py

import json
import os
import sys
import jsonschema


if len(sys.argv) > 1:
    resolver = jsonschema.RefResolver('file://%s/' % os.path.abspath(os.path.dirname(__file__)), None)
else:
    resolver = None


with open('base.json') as fp:
    base = json.load(fp)

with open('derived.json') as fp:
    derived = json.load(fp)

with open('data.json') as fp:
    data = json.load(fp)


try:
    jsonschema.validate(data, base, resolver=resolver)
    print("Passed base schema.")
except Exception as ex:
    print("Failed base schema: %s" % ex)

try:
    jsonschema.validate(data, derived, resolver=resolver)
    print("Passed derived schema.")
except Exception as ex:
    print("Failed derived schema: %s" % ex)

If I use the resolver, the relative path gets resolved correctly but the within-document reference fails. If I don't use the resolver, the within-document reference succeeds but the relative path does not get resolved.

# With resolver
python validate.py resolver
Failed base schema: Unresolvable JSON pointer: 'definitions/string_alias'
Passed derived schema.

# Without resolver
python validate.py         
Passed base schema.
Failed derived schema: unknown url type: 'base.json'

Any suggestions on how to get both to work?

Edit: Looks like this may be related to #306. Edit: A temporary workaround is to have base.json reference itself explicitly, i.e. use base.json#/definitions/string_alias instead of #/definitiosn/string_alias.

tillahoffmann avatar Jul 04 '17 11:07 tillahoffmann

I agree, this seems to be an issue when the base uri is not an HTTP or HTTPS scheme. When the item is other than these, reference is lost.

The workaround is to have the reference to self explicitly defined. Instead of #/definitions/variable_name one must use <current_id>#/definitions/variable_name

smittysmee avatar Aug 01 '17 20:08 smittysmee

+1 on this

Lordnibbler avatar Aug 03 '17 22:08 Lordnibbler

+1

chimeno avatar Oct 04 '17 14:10 chimeno

+1

joecabezas avatar Feb 14 '18 23:02 joecabezas

+1

gaoxinyang avatar Jul 19 '18 15:07 gaoxinyang

Any news?

SamuelePilleri avatar Nov 16 '18 14:11 SamuelePilleri

@Julian looking through this and the related #98 it seems that the problem is uncertainty over how to set the initial base URI in the absence of a root $id (or id for older drafts).

The way I've handled this using other implementations is to give all schemas a root $id, which is the RECOMMENDED approach in the spec (more on that at the end). But in the Python implementation I worked on for a bit (before realizing you were still active and dropping it :-), I interpreted RFC 3986 §5.1.3's rules around establishing a base URI from the "retrieval" URI as meaning that any document loaded from a local files system should have a base URI that is the file:// URI of the local file.

  1. Does this seem like a reasonable interpretation to you?
  2. Should I clarify this in the spec?
  3. As an implementor, do you see any problems with implementing that interpretation?

This only affects resolving URIs, and imposes no requirements on whether the implementation can automatically load the files or not.

My assumption would be that your command-line script would construct the file:// URI and pass it to the library, although I don't care how it's actually implemented. However, I do not think that an implementation is responsible for determining a retrieval base URI when it is instantiated as a library and just handed a data structure. The library has no way of knowing what file:// URI to construct, or if there even is a filesystem involved. In that case, I believe that RFC 3986 §5.1.4 applies, which basically says "eh, whatever".

More specifically, it reads in part:

A sender of a representation containing relative references is responsible for ensuring that a base URI for those references can be established.

The proper way to do this is to set $id (or id for draft-04). The most recent work I did was with a JavaScript implementation, and I just set $id with https:// URIs for every schema document, pre-loaded all of them as documented by the library, and references worked even though no HTTP requests were ever made.

@tillahoffmann, @SamuelePilleri, etc: JSON Schema documents are intended to have $schema and $id (or id) set in the root schema objects specifically to avoid this problem.

I think we should ensure that the spec makes clear whether or how to set a base URI when a file path is known, but the proper solution is to set your own base URI.

handrews avatar Nov 26 '18 05:11 handrews

@handrews have been meaning to respond in some more detail but haven't had the moment to sit down and do it, so might as well throw a short response up in the meanwhile :)

Does this seem like a reasonable interpretation to you?

Yes!

Should I clarify this in the spec?

Maybe, though I'd not expect this to be different for file / the filesystem than any other URI, yeah? So if there's what to clarify [I don't remember the current language] maybe it's just saying "if users forget to specify an $id and you loaded the schema from somewhere, assume one using the URI you used to fetch the document".

As an implementor, do you see any problems with implementing that interpretation?

Nope should be reasonable!

Julian avatar Nov 29 '18 13:11 Julian

@Julian

"if users forget to specify an $id and you loaded the schema from somewhere, assume one using the URI you used to fetch the document".

Yeah that's basically what I'm going for. Somewhere there's a section on establishing a base URI, so I'll look at how that's worded. Perhaps including a note that a local filesystem is one such "somewhere" rather than calling out the file:// scheme.

handrews avatar Nov 30 '18 01:11 handrews

I had a hard time figuring this out myself, and I kept googling into this github issue. (The solution is pretty much what @clenk says above.)

Here's a SO Q&A which covers it: https://stackoverflow.com/questions/53968770/how-to-set-up-local-file-references-in-python-jsonschema-document

Hopefully this helps someone in the future.

topher515 avatar Dec 29 '18 10:12 topher515

I might add, external references in json schemas is the worst documented feature in the specification and by consequence the worst implemented on every implementation across different languages, because of that is very sensitive between languages and even between implementation in the same language, some require some extra work to make it work, and other a complete redeclaration if the IDs, I strongly suggest to use the same library across projects or leave the validation to an external common service that only does that (maybe asynchronously)

Happy New year everyone!

On Sat, Dec 29, 2018, 7:42 AM Chris Wilcox [email protected] wrote:

I had a hard time figuring this out myself, and I kept googling into this github issue. (The solution is pretty much what @clenk https://github.com/clenk says above.)

Here's a SO Q&A which covers it: https://stackoverflow.com/questions/53968770/how-to-set-up-local-file-references-in-python-jsonschema-document

Hopefully this helps someone in the future.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Julian/jsonschema/issues/343#issuecomment-450483593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGuZiONQ7GJNmndYd3nPX0hWvET0RgRks5u90cMgaJpZM4ONNy- .

joecabezas avatar Jan 01 '19 21:01 joecabezas

@joecabezas I did add https://github.com/json-schema-org/json-schema-spec/pull/686 for the next draft which should nudge people a bit more in the right direction. The problem is that there are endless ways to store and find "external" documents. The spec cannot cover every single filesystem, IoT device, network protocol, etc. etc. We have to rely on people understanding URIs (and how they don't necessarily directly reflect storage) or implementing a loading mechanism that works for their environment.

In the last draft we did add a section on Loading Reference Schemas and Dereferencing in an effort to make this more clear, but that was after the initial "draft-07" version so I'm not sure how many implementations have made use of it.

handrews avatar Jan 02 '19 06:01 handrews

Thanks for the effort into going in the right direction! Very much appreciated

On Wed, Jan 2, 2019, 3:41 AM Henry Andrews [email protected] wrote:

@joecabezas https://github.com/joecabezas I did add json-schema-org/json-schema-spec#686 https://github.com/json-schema-org/json-schema-spec/pull/686 for the next draft which should nudge people a bit more in the right direction. The problem is that there are endless ways to store and fine "external" documents. The spec cannot cover every single filesystem, IoT device, network protocol, etc. etc. We have to rely on people understanding URIs (and how they don't necessarily directly reflect storage) or implementing a loading mechanism that works for their environment.

In the last draft we did add a section on Loading Reference Schemas https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8.3.1 and Dereferencing https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8.3.2 in an effort to make this more clear, but that was after the initial "draft-07" version so I'm not sure how many implementations have made use of it.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Julian/jsonschema/issues/343#issuecomment-450795016, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGuZhvYIwkrR59x-h5wESxqY-qiqX4Aks5u_FSJgaJpZM4ONNy- .

joecabezas avatar Jan 03 '19 09:01 joecabezas

Great info, thanks! I just wanted to add the $ref section from the Swagger 3.0 docs:

# $ref Syntax
According to RFC3986, the $ref string value (JSON Reference) should contain a URI, 
which identifies  location of the JSON value you are referencing to. If the string value 
does not conform URI syntax rules, it causes an error during the resolving. Any members 
other than $ref in a JSON Reference  are ignored. Check this list for example values 
of a JSON reference in specific cases:

## Local Reference 
* $ref: '#/definitions/myElement' # means go to the root of the current and find elements 
definitions and myElement one after one.
## Remote Reference 
* $ref: 'document.json' Uses the whole document located on the same server and 
 the same location. 
The element of the document located on the same server 
* $ref: 'document.json#/myElement'
The element of the document located in the parent folder 
* $ref: '../document.json#/myElement'
The element of the document located in another folder 
* $ref: '../another-folder/document.json#/myElement'
## URL Reference 
* $ref: 'http://path/to/your/resource' Uses the whole document located on the different 
server.
The specific element of the document stored on the different server 
* $ref: 'http://path/to/your/resource.json#myElement'
The document on the different server, which uses the same protocol 
(for example, HTTP or HTTPS) 
* $ref: '//anotherserver.com/files/example.json'
**Note:** When using local references such as #/components/schemas/User in 
YAML, enclose the value in quotes: '#/components/schemas/User'. Otherwise it 
will be treated as a comment.

mellertson avatar Sep 30 '19 02:09 mellertson

@mellertson OpenAPI (formerly known as Swagger but seriously they changed it in 2015 it's been four years already) uses a subset of $ref functionality, as they do not support $id. It is not advisable to apply their rules to JSON Schema as they omit several important cases.

handrews avatar Oct 01 '19 00:10 handrews

still unfixed

doubler avatar Nov 02 '22 10:11 doubler

Hello there!

This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.

The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.

It looks from my testing like indeed this specific example works there! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.

I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!

Here's a modified example of your code which seems to work, in case it helps show how to use the new release:

import json
import jsonschema
from referencing import Registry
from referencing.jsonschema import DRAFT7


with open('base.json') as fp:
    base = DRAFT7.create_resource(json.load(fp))

with open('derived.json') as fp:
    derived = DRAFT7.create_resource(json.load(fp))

registry = Registry().with_resources(
    [("base.json", base), ("derived.json", derived)],
)

with open('data.json') as fp:
    data = json.load(fp)


jsonschema.validate(data, base, registry=registry)
jsonschema.validate(data, derived, registry=registry)

Julian avatar Feb 23 '23 09:02 Julian

I can't make this work. The code above returns TypeError: argument of type 'Resource' is not iterable.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[37], line 21
     17 with open('documents.json', 'r') as file:
     18     data = json.loads(file.read())
---> 21 jsonschema.validate(data, base, registry=registry)
     22 jsonschema.validate(data, derived, registry=registry)

File ~/.pyenv/versions/3.10.13/envs/papask/lib/python3.10/site-packages/jsonschema/validators.py:1302, in validate(instance, schema, cls, *args, **kwargs)
   1243 """
   1244 Validate an instance under the given schema.
   1245 
   (...)
   1299     `jsonschema.validators.validates`
   1300 """
   1301 if cls is None:
-> 1302     cls = validator_for(schema)
   1304 cls.check_schema(schema)
   1305 validator = cls(schema, *args, **kwargs)

File ~/.pyenv/versions/3.10.13/envs/papask/lib/python3.10/site-packages/jsonschema/validators.py:1371, in validator_for(schema, default)
   1312 """
   1313 Retrieve the validator class appropriate for validating the given schema.
   1314 
   (...)
   1367 
   1368 """
   1369 DefaultValidator = _LATEST_VERSION if default is _UNSET else default
-> 1371 if schema is True or schema is False or "$schema" not in schema:
   1372     return DefaultValidator
   1373 if schema["$schema"] not in _META_SCHEMAS and default is _UNSET:

TypeError: argument of type 'Resource' is not iterable

krystof-k avatar Dec 20 '23 16:12 krystof-k

Please open a discussion with your code that reproduces (it sounds like you're passing a Resource where you should be passing a schema).

Julian avatar Dec 20 '23 16:12 Julian

Thanks, got it, fixing your example from above:

import json
import jsonschema
from referencing import Registry
from referencing.jsonschema import DRAFT7


with open('base.json') as fp:
    base = json.load(fp)
    base_resource = DRAFT7.create_resource(base)

with open('derived.json') as fp:
    derived = json.load(fp)
    derived_resource = DRAFT7.create_resource(derived)

registry = Registry().with_resources(
    [("base.json", base_resource), ("derived.json", derived_resource)],
)

with open('data.json') as fp:
    data = json.load(fp)


jsonschema.validate(data, base, registry=registry)
jsonschema.validate(data, derived, registry=registry)

krystof-k avatar Dec 20 '23 16:12 krystof-k