require that `$schema` cannot contain a fragment
What kind of change does this PR introduce?
Clarification
Issue & Discussion References
- Closes #1590
Summary
Adds a requirement to $schema that its value cannot contain a fragment. This prevents someone from using a non-canonical IRI in this keyword.
Does this PR introduce a breaking change?
Technically, yes, but the likelyhood of people using fragments in $schema is low (probably non-zero, though).
I like option 2 -- it exactly covers what we want to do while accomodating the existence of older drafts.
We should still be able to disallow fragments (even empty ones), while still allowing the use of the draft7 and earlier schemas in a new implementation, even with a restriction in the metaschema itself (e.g. with the pattern keyword) -- as a schema should not be getting validated against a metaschema other than the one specified in the $schema keyword.
We would just need to point out the exception for older schemas, so the implementation doesn't reject the use of an empty fragment before considering if an older draft's semantics should apply.
By allowing these, are we saying that all historical schemas are valid v1 schemas that just use different dialects, e.g. a draft 6 dialect? Otherwise it seems to me that those schemas would be valid under their own spec versions but not v1, and thus we don't need to state the exception.
This is the change I worked up:
The value of this keyword MUST be an
[absolute IRI](https://www.rfc-editor.org/info/rfc3987) (without a fragment). This
IRI MUST be normalized. Exceptions exist for previous versions of this
specification and meta-schemas based on those version, which have fragments
defined in their `$schema` values, namely:
- `http://json-schema.org/draft-03/schema#`
- `http://json-schema.org/draft-04/schema#`
- `http://json-schema.org/draft-06/schema#`
- `http://json-schema.org/draft-07/schema#`
But it seems odd to explicitly reference older versions like this.
By allowing these, are we saying that all historical schemas are valid v1 schemas that just use different dialects, e.g. a draft 6 dialect?
That definitely doesn't sound right. v1 is a distinct dialect from draft-06.
Otherwise it seems to me that those schemas would be valid under their own spec versions but not v1, and thus we don't need to state the exception.
Maybe, but that's confusing at best. $schema is used to determine which dialect to choose, so it's a chicken/egg problem for different dialects to define $schema differently. You have to first determine the dialect, then you can determine which version of the spec it refers to, and finally you can validate it, which at that point seems pointless since it's already done its job. Usually you validate something before you use it. There's not much point in validating it after. So, is this actually a constraint, or just a schema author convention?
Consider that it's not necessarily just those dialect URIs you listed that could have fragments. An implementation could allow you to define a dialect of any version. I can make a dialect that extends draft-07 and it would be allowed to have a fragment. So, I can't just have an allow list of dialects that are exceptions. The only way is to identify the dialect and then pointlessly validate using the rules of the version it uses.
I think what we can say is that implementations MUST NOT allow users to define a dialect of v1 with a URI that isn't absolute and normalized. But, constraining what value $schema can have isn't practical. The definition of $schema might change from draft to draft, but implementations have to have one version of the keyword because they don't already know which version to use until they've determined the dialect.
I'm not sure where that leaves us. If we want to move forward with this, I wouldn't call out the draft dialects as an exception, but rather include a footnote that says that implementations that support draft versions as well as v1 would need to be more flexible in what they accept to allow for URIs that were allowed in the versions they support.
However, I think there's also a "do nothing" option here. In v1, $schema isn't a meta-schema identifier, just a dialect identifier. The ambiguity of the fragment was how the fragment should be interpreted in a meta-schema. That's not a concern because it's not identifying a meta-schema anymore. It seems plenty good enough for it to just be a convention that the URI is absolute and normalized, but it doesn't really matter because it's just an identifier. It doesn't need to be enforced.