Unable to reference local "urn:" URI:s
At $workplace, all helm charts gets their values.schema.json generated by a tool that heavily uses $ref properties to split up the file into multiple schemas.
I noticed that these schema files can not be parsed by check-jsonschema, but they work great with helm. I have verified this using latest master commit of check-jsonschema.
I do not know if it is helm or check-jsonschema that do not follow the JSON schema spec, or if this behavior is not covered by the spec (read: each tool can have different behavior).
My aim with this issue is to get answer to the above, and possibly raise awareness of a bug in check-jsonschema.
Here follows a contrived example that shows the problem:
values.yaml:
asdf: false
values.schema.json:
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "urn:my-project:helm:schemas:v1:my-helm-chart",
"type": "object",
"properties": {
"asdf": {"$ref": "urn:my-project:helm:schemas:v1:asdf-type/boolean"}
},
"$defs": {
"random-string-456def": {
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "urn:my-project:helm:schemas:v1:asdf-type/boolean",
"type": "boolean"
}
}
}
Results:
> helm lint .
==> Linting .
[INFO] Chart.yaml: icon is recommended
1 chart(s) linted, 0 chart(s) failed
> check_jsonschema --schemafile values.schema.json values.yaml
Failure resolving $ref within schema
_WrappedReferencingError: Unresolvable: urn:my-project:helm:schemas:v1:asdf-type/boolean
in "/local/workspace/repos/_external/check-jsonschema/venv/lib/python3.12/site-packages/check_jsonschema/checker.py", line 85
>>> result = self._build_result()
caused by
Unresolvable: urn:my-project:helm:schemas:v1:asdf-type/boolean
in "/local/workspace/repos/_external/check-jsonschema/venv/lib/python3.12/site-packages/jsonschema/validators.py", line 463
>>> resolved = self._resolver.lookup(ref)
caused by
Unretrievable: 'urn:my-project:helm:schemas:v1:asdf-type/boolean'
in "/local/workspace/repos/_external/check-jsonschema/venv/lib/python3.12/site-packages/referencing/_core.py", line 682
>>> retrieved = self._registry.get_or_retrieve(uri)
caused by
FileNotFoundError: [Errno 2] No such file or directory: '/local/workspace/repos/_external/check-jsonschema/urn:my-project:helm:schemas:v1:asdf-type/boolean'
in "/local/workspace/repos/_external/check-jsonschema/venv/lib/python3.12/site-packages/referencing/_core.py", line 428
>>> resource = registry._retrieve(uri)
I don't consider this a bug in check-jsonschema -- at best it's a missing feature. I wouldn't call it a "bug" in the behavior of your generator tooling either. IMO the schema it has produced here is weird but valid.
I'm not decided yet on whether or not we should try to make this work.
The trouble with a schema of this sort is that it contains references to its component $defs by URN, but those URN $ids can only be added to the reference registry when we actually interact with those parts of the document. Because we don't want to necessarily crawl and process a whole document when only a small part of it is used, those URNs are unknown to check-jsonschema at the time that we try to resolve them.
If we reorder the way that the document is processed, we could make it work, but I think that could have some surprising impacts for other usages.
Specifically, here's roughly what happens when this schema is processed vis-a-vis the sample input YAML instance doc:
- we parse the schema as JSON
- we read
$schemaand$idfrom the root of the document and register that information a. Note the referencable$idas"urn:my-project:helm:schemas:v1:my-helm-chart"b. Select which schema dialect to use based on$schemac. Also note the base URI for the schema, to potentially return external relative schema references - we parse the instance doc as YAML
- we start evaluating the schema against that instance
- we check
propertiesof that instance against the schema a.asdfis recognized as a property, we start to validate it b.asdfvalidation requires that we lookup"urn:my-project:helm:schemas:v1:asdf-type/boolean"c. That ID is not known to us (note that we haven't looked at$defsat all yet!) d. We try to do a "local relative reference lookup" with"urn:my-project:helm:schemas:v1:asdf-type/boolean"💥 !
I'd accept the idea that not recognizing the URN as such and trying to handle it as a relative path to a schema file is not graceful, but IMO that's a reasonable fallback behavior -- after all, you could create a local file whose name is URN-formatted.
cc @Julian, I'd be curious about your take on this one. I think referencing can support this if check-jsonschema gives the user some way to get that $id into the registry before trying to do validation. Am I right about that?
The path I can imagine here is for check-jsonschema to offer a flag which would eagerly crawl the schema doc looking for components with $id present, and add them to the referencing registry.
Just to imagine it for a moment, with a jsonpath UX: --eagerly-discover-ids "$" to discover everything, and --eagerly-discover-ids "$['$defs']" to just crawl $defs.
But I don't think it's worth pursuing unless there are other users who want this kind of eager $id discovery.
Great explanation. Thank you.
Note that I personally are not in need of this feature. I just wanted to share my findings. I was just playing around. Our company's schemas are made for helm and there they work great.
I agree falling back to looking for a schema file is reasonable. But maybe the error message could be made better. Currently, when you receive the error, it looks like it only tried looking for the URN as a file, which confused me for a while.
I think referencing can support this if check-jsonschema gives the user some way to get that $id into the registry before trying to do validation. Am I right about that?
I'm confused as to why this wouldn't work out of the box -- the point of the referencing toolkit is to support this kind of thing -- it's valid to declare an $id anywhere in the set of schemas a user provides (including definitely in $defs, but even anywhere inside some transitively referenced schema). The job of referencing.Registry is essentially to do exactly this lazy lookup so that if one asks for a reference that we haven't looked for yet, we go off and try to find it anywhere it's allowed to live.
So yeah in short I'm not sure without looking closer why this doesn't work, it certainly seems like it should!
Oh! -- again without actually running anything so I could be off -- but probably the reason it doesn't work is quite simple, you have your $schema set to draft 7, and the $defs keyword doesn't exist in draft 7. It's called definitions in that version.
Very interesting!
Indeed, when I change all occurrences of "$defs" with "definitions" then check-jsonschema is able to validate our schemas.
The change from keyword "definitions" to "$defs" seems to have been made in JSON Schema draft 2019-09. This can be seen when comparing it to the previous draft definition:
- https://json-schema.org/draft-07/draft-handrews-json-schema-01#rfc.section.8.3.2
- https://json-schema.org/draft/2019-09/draft-handrews-json-schema-02#rfc.section.8.2.5
So I guess the conclusion is that helm is strange for understanding keyword "$defs" in a draft-7 schema.
@Julian, thanks for the added info about referencing.Registry trying to do the lazy lookups for us! I hadn't realized that it would do so -- I had a much simpler mental model for how it worked, that it would only hold IDs which were explicitly written to it and resolve relative refs.
I think that means I shouldn't try to add any accommodating feature for this, at least not at present.
If we see more tools out there mixing different schema dialects over time, we can revisit. For example, we could allow users to override the dialect used by their schema. But for now I'm going to go ahead and close.