json-schema-spec
json-schema-spec copied to clipboard
contentSchema has implementation-defined referencing behavior when contentMediaType is not present
The text of 2020-12's contentSchema says:
The value of this property MUST be a valid JSON schema. It SHOULD be ignored if "contentMediaType" is not present.
In #1288 which I've found, there was some back and forth about what the first sentence there means (which I agree with -- i.e. that the current spec seems to treat this schema as any other subschema, by virtue of omission). But the second sentence still has undefined behavior, because "SHOULD" there isn't clear.
Specifically, this schema (or others like it) when validating the instance 12:
{
"$ref": "#foo",
"contentSchema": {
"$anchor": "foo",
"type": "string"
}
}
either blows up if one follows the SHOULD or else returns invalid if one ignores the SHOULD about contentMediaType being missing. (FWIW of the implementations Bowtie supports, there appear to be at least 1 implementation with both behaviors, though unclear whether they explicitly intend to do so).
It seems unnecessary to have undefined behavior in this case (one reminiscent of old $ref), so my personal opinion would be to remove that second sentence (and not intimate that it's ok to ignore it), or to just make it invalid to have contentSchema present without contentMediaType (and add a dependentRequired to the metaschema).
(n.b. #1352 already tweaked the language here coming out of #1288, but the second sentence is still present.)
blows up if one follows the SHOULD
I don't follow your logic here. Following the SHOULD means that contentSchema is ignored in your example. I see nothing wrong with that.
It doesn't mean that identifiers aren't still registered, just that the keyword isn't evaluated.
Here's the language from draft 7's $ref definition:
All other properties in a "$ref" object MUST be ignored.
where it was essentially universally understood that this means identifiers are not registered. As far as a quick skim, no other language is present there, that's how we chose to communicate that behavior.
So I think you're asserting something (that there's only one interpretation for the current language) which isn't consistent with historical wording.
But at worst even if you're right there I'd still think it's unclear rather than really undefined.
I think the intention was that that quote is in the context of using the contentSchema keyword as an assertion (which is optional, off-by-default behavior). It doesn't mean that the contentSchema sub-schema should be ignored, it means that it shouldn't apply contentSchema as an assertion on the parsed string data. I think it's a "SHOULD" in order to allow for but discourage heuristic-based parsing of the string value.
So, I think it would be wrong for your example to produce an error. The text is certainly ambiguous and should be clarified in the next release, but I weight intention pretty high when there is ambiguity in the text and I'm pretty confident that was the intention.
Fair enough maybe it's not needed for us to argue or agree on how to interpret the current draft (I suspect we probably all agree it's quite unlikely anyone has implemented anything related to this at all, let alone anything depending on the small nonsense point I'm raising) so perhaps consider this just a "it seems like this line needs tweaking for the next version" ticket.