jsonschema
jsonschema copied to clipboard
RefResolver (or its successor) should allow customizing deserialization, not only scheme / retrieval
I'm using currently handling schemas written in YAML (for human-readability) but validated by jsonschema. I parse the schema in YAML (and make sure I subset my YAML to stay JSON-compatible) and then run the parsed dictionary through jsonschema. It is working nicely with one issue: When I want to use file:// includes, the files are implicitly assumed to be json.
@Julian I would like to add support for these files being YAML and am wondering if this is something you would support. If so, what is your recommendation for the cleanest way to add it? If it's not something you support, what would you recommend as the cleanest way to add the support for my own code without hacking/forking jsonschema? I thought about adding custom resolver as is done in https://github.com/Julian/jsonschema/issues/225, but it sounds like RefResolver is not intended to be API-stable, so subclassing is liable to break in the future. Is there another way?
Thanks, Martin
Upon further code reading, I found the handlers argument to RefResolver. If I attach a yaml:// handler (or similar), I can make this work. That said, it feels hacky, in that yaml:// is not really a location (like file:// or http://) and it would be better to decouple the file format (YAML/JSON) from the location (including the logic for handling base URIs and $id).
@martingkelly I'm doing the same thing with YAML and have $ref's working. See here: https://github.com/robherring/yaml-bindings/blob/master/dtschema.py#L212
Thanks @robherring, I ended up doing the same thing. I think it's not ideal in that by implementing your own handler, you lose the more general caching logic, and you lose the ability to do JSON http... ideally the document format and transport would be better decoupled.
Btw, I really like your device-tree YAML project; I think it's an excellent idea :).
You can see one approach to this problem in JavaScript in the AutoExtensionDereferencer function of JSON Schema Tools. Although the "dereferencer" part is not necessarily relevant here, just the automatic file extension search approach, which also understands overriding the URI scheme to load from a file as if it were a file:// URI.
Yes, something like that would work. @Julian any thoughts?
Have to respond in a bit more length -- something a bit more general than that (that doesn't use file extensions, just fully allows someone to inject in what to deserialize to what) is what I'd lean to, though this also strikes me as the sort of thing I wish existed in an external spec -- i.e. within HTTP, if you were able to say yaml+<URL> and that'd mean "what lives there at that URL should be treated as YAML".
To some extent that's Accept / Content-Type, but it's not used that way by client libraries right, so something that says "I'm JSON" -- there's no HTTP client (for Python) that will then say "OK I'll automatically know to call json.loads on that body" that I know of (and to allow you to basically hook into any possible Content-Type value).
I agree with that ideally being in spec but don't know enough to say where that would fit in, if at all. @handrews any opinion on integrating this sort of thing into spec?
Not the JSON spec, an HTTP spec
On Thu, Jun 7, 2018, 13:14 Martin Kelly [email protected] wrote:
I agree with that ideally being in spec but don't know enough to say where that would fit in, if at all. @handrews https://github.com/handrews any opinion on integrating this sort of thing into spec?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Julian/jsonschema/issues/420#issuecomment-395497541, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUIXlwoIq3qB8Sh-aK26rh6xWV9dxaqks5t6V-EgaJpZM4UZ5bB .
Or a more general URI one
On Thu, Jun 7, 2018, 13:19 Julian Berman [email protected] wrote:
Not the JSON spec, an HTTP spec
On Thu, Jun 7, 2018, 13:14 Martin Kelly [email protected] wrote:
I agree with that ideally being in spec but don't know enough to say where that would fit in, if at all. @handrews https://github.com/handrews any opinion on integrating this sort of thing into spec?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Julian/jsonschema/issues/420#issuecomment-395497541, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUIXlwoIq3qB8Sh-aK26rh6xWV9dxaqks5t6V-EgaJpZM4UZ5bB .
I realize that would be more ideal, but I'm guessing it's out of scope and would take years to modify HTTP or URI specs. I'm wondering if there's any place for jsonschema to graft something in, even if not ideal.
@martingkelly it really does not fit in the JSON Schema spec. For one thing, it's much better to allow implementations flexibility in how they manage this. Some implementations will be able to support rather complex loading/parsing features, others will be resource- or performance-constrained and will only want to support JSON.
Additionally, this is a pretty easy problem to solve outside of implementations- write your schemas in whatever you want, and have a build step that converts them to JSON.
Additionally, this is a pretty easy problem to solve outside of implementations- write your schemas in whatever you want, and have a build step that converts them to JSON.
Although a build step that converts JSON to YAML is technically possible, I don't think it's a great solution, as all the file references have to somehow be faked out and turned into JSON files temporarily, or you have to copy the build directory somewhere else and convert the files into JSON (but keep the file extension of .yaml even though the file is JSON). If would much prefer the code itself natively supporting YAML. I can add this as a jsonschema extension using a handler, but it has the drawbacks I mentioned above.
Still haven't gotten a chance to elaborate here, but just on that last message -- just in case you weren't aware, all JSON is essentially valid YAML.
I do know that :). In this case, I'm writing schemas in YAML (json-compatible subset) and parsing them with jsonschema. I like this approach because I find YAML much more readable than JSON, but I find jsonschema to be a useful tool.
@martingkelly why would you try to do anything other than edit the files in YAML? The build step would happen before you try to load anything, so you shouldn't need to refer directly to the YAML paths.
My point is just that I would then have to (non-intuitively) write schema.json in my file instead of schema.yaml, even though I am referring to schema.yaml. This is a "gotcha" for schema usability that I'm hesitant to introduce. In addition, I'm not fond of requiring a build step just to get around jsonschema not parsing YAML... when possible, I like to be able to use libraries as they are without adding my own required wrappers. Requiring a build step is another "gotcha" for those dealing with schemas I write.
My point is just that I would then have to (non-intuitively) write schema.json in my file instead of schema.yaml, even though I am referring to schema.yaml
No, you're referring to the post-build-step schema.json, so it makes sense. Or you can just not use extensions, which is how it should be if you're doing hypermedia anyway because content negotiation.
Although obviously I agree with you on supporting loading from other formats as that's why I wrote that part of @cloudflare/json-schema-transform and packaged it in @cloudflare/json-schema-ref-loader (although those packages have weird restrictions around $ref for historical reasons, which I hope to fix one day).
No, you're referring to the post-build-step schema.json, so it makes sense. Or you can just not use extensions, which is how it should be if you're doing hypermedia anyway because content negotiation.
This is one of those "it makes sense if you think about it the right way" things :). Yes, I understand your rationale, but I think this would be confusing for newcomers to my project, and I prefer to keep it as simple as I can.
It sounds like we all agree that it would be nice to support more underlying formats than just json... the question is where to add that support, since it sounds like it's out of spec for the jsonschema spec itself, and trying to change something in the URI spec would take forever and likely fail. It seems to me we are stuck with unofficial extensions to this particular library.
The spec already makes a point of saying that it interprets instances according to a data model rather than as JSON text. This holds true when the schema is an instance (with respect to the meta-schema).
So the spec already does not care how you get something into the data model, although it only defines the mapping for JSON.
What the spec avoids doing is placing requirements on such support. There are many reasons that an implementation may not want to support other formats. JSON was designed to be simple and fast to parse (which, as I understand it, is why there are some weird corner cases like duplicate properties when parsing- leaving them undefined allows a faster implementation than requiring detecting the error and taking action on it).
For that matter, an Internet of Things implementation might not even load JSON, it might only load CBOR.
I understand JSON schema is not unique to JSON (despite the name), but since it's not specified how a parser should interpret a given $ref, each library will invent their own syntax for specifying content type. That seems non-ideal to me.
I stupidly didn't see contentMediaType before in the spec, but what if we check for that key and, if specified, assume that a given $ref is not JSON but is instead the specified media type?
contentMediaType is for media types encoded as JSON strings, so I don't think it helps here?
but since it's not specified how a parser should interpret a given $ref, each library will invent their own syntax for specifying content type. That seems non-ideal to me.
Per section 8.3.1, $ref values are identifiers, and do not say anything at all about how the identified schema is loaded.
If it is being loaded over a network, then the network protocol (probably HTTP or CoAP) should be used to understand and handle the available media type(s). There is no need for JSON Schema to say anything about that process- it is how the web works already. For something like file:// URIs, while it is less clear how to detect and manage media types, that problem is not at all specific to JSON Schema.
If it has been pre-loaded (which is a common approach, as it separates parsing concerns from referencing concerns), then presumably it was parsed into an in-memory representation already and exactly how that happens and what formats are supported is intentionally left to implementations. Different implementations serve different needs, and can balance flexibility vs performance/resource consumption.
The idea of treating $ref as an RFC 8288 web link and allowing target hints has come up at least once before (with OpenAPI), but in a somewhat different context. There is https://github.com/json-schema-org/json-schema-spec/issues/141, which gets into talking about the type of reference/pointer targets (initially just for JSON Pointer, but the discussion expanded along the way). That might be a place to look if you want to advocate for something like this in the spec. We shouldn't hijack @Julian's project for spec proposals :-)
Yes, in my case I'm dealing with only local file references, no network at all. I think having some way to specify a mime-type or similar would be useful... I frankly don't care if it's in the spec or not as long as it solves my problem :). It's perhaps hacky, but having an optional $mime-type key in parallel with $ref would solve the problem. You could then pair a parser to a mime type.
A very much useful step forward would be to pass the referrer document to the handler. Right now nothing is happening with it (PyCharm shows no usage beyond it being stored on the object).
But if you resolve relative file references, as one does in for instance OpenAPI specifications, then you need a source document to translate relativity. Right now, I either need a pre-process step that resolves all the references to absolute file URLs or patch jsonschema. Triggered by this bug.