json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

prohibit the use of fragments in $id and $schema

Open karenetheridge opened this issue 3 years ago • 24 comments

We permit empty fragments in $id, and say nothing at all about fragments in $schema. IMO we should prohibit all kinds of fragments in both keywords' URIs.

cross-reference: https://github.com/OAI/OpenAPI-Specification/issues/2433#issuecomment-756389903

karenetheridge avatar Jan 07 '21 21:01 karenetheridge

I'd rather go the other direction and make $id and $schema consistent with $ref. The absolute-URI portion determines the document fetched from the schema-store and the fragment portion determines which part of the document to start evaluation when processing the schema.

For $schema, this would mean that all of the following would identify the same dialect, but might resolve to different meta-schemas.

  • https://json-schema.org/draft/future
  • https://json-schema.org/draft/future#
  • https://json-schema.org/draft/future#strict
  • https://json-schema.org/draft/future#uber-strict

For $id, this would mean that embedded schema evaluation would need to take into account the fragment to determine where to begin evaluation.

{
  "type": "object",
  "properties": {
    "foo": {
      "$id": "/schema/bundle#/$defs/foo",
      "$defs": {
        "string": { "type": "string" },
        "foo": { "$ref": "#/defs/string" }
      }
    }
  }
}

In this example, /properties/foo resolves to { "type": "string" }. It might seem strange when we look at it embedded like this, but this is exactly what is happening when we resolve a $ref. We begin validation at the location identified by the fragment, but the whole document has to be in scope in order to resolve references.

If we change that example slightly, I think it's a little easier to see how the $id with fragment is the same as $ref with fragment.

{
  "type": "object",
  "properties": {
    "foo": { "$ref": "/schema/bundle#/$defs/foo" }
  },
  "$defs": {
    "": {
      "$id": "/schema/bundle",
      "$defs": {
        "string": { "type": "string" },
        "foo": { "$ref": "#/defs/string" }
      }
    }
  }
}

It's certainly more intuitive written like this, but I don't think aesthetics are good enough reason to forbid using fragments in $id.

jdesrosiers avatar Jan 11 '21 05:01 jdesrosiers

allowing pointer fragments in $id was an oversight to begin with. disallowing it resolved a confusing/undefined situation where a schema could identify itself as being at one pointer, while actually being at a different pointer in the resource or document.

@jdesrosiers I'm not sure why $id should allow a fragment, even in your proposal - which I understand is not the same as declaring the location of the schema itself as schemas used to sometimes do. what you suggest looks like a confusing alternative to $ref, and I think your example schema has an equivalent using $ref:

{
  "type": "object",
  "properties": {
    "foo": {
      "$id": "/schema/bundle",
      "$ref": "#/$defs/foo",
      "$defs": {
        "string": { "type": "string" },
        "foo": { "$ref": "#/defs/string" }
      }
    }
  }
}

I see no upside to putting a fragment in $id, only confusion.

whether $id should allow an empty fragment, I think it should be disallowed at some point. it has remained for historical reasons, I think, but every new draft breaks compatibility of more major things than that anyway.

as for a $schema with a fragment, I don't know that it matters. if the $schema URI does not point to a metaschema the implementation is able to use, fragment or no, the implementation will presumably error attempting to resolve that.

it seems faintly plausible to me that a metaschema could be identified by $schema using a fragment (my implementation does support metaschemas which are not at the root of their document) - but it's hard to imagine the value of such a thing.

notEthan avatar Jan 11 '21 13:01 notEthan

it seems faintly plausible to me that a metaschema could be identified by $schema using a fragment (my implementation does support metaschemas which are not at the root of their document) - but it's hard to imagine the value of such a thing.

This was exactly the use-case we were discussing over at OAS. We already have a meta-schema for OAS 3.0 (and 1.2 and 2.0) but it contains embedded definitions of the OAS schemaObject (which in these versions is a superset-subset of proper JSON Schema). It would be useful for users who want to continue to use OAS 3.0-compatible schemaObjects to be able to reference the schemaObject definition within the existing metaschema using $schema or our jsonSchemaDialect property, without the oai/TSC having to extract that schemaObject definition to its own resource and maintain it separately.

MikeRalphson avatar Jan 11 '21 13:01 MikeRalphson

@MikeRalphson that is interesting and worth thinking about in this context, although I think that embedded metaschema will almost certainly have an $id, and so not need to use any fragment to refer to it with $schema. if the document it is embedded in has another id, it might be possible to address that metaschema using that id with a fragment, but it would probably be strongly recommended against (which is why I say above I don't imagine such a construct probably has much value). I'm making a number of suppositions here without knowledge of the openapi work, though.

notEthan avatar Jan 11 '21 14:01 notEthan

@notEthan That's useful - thanks. The use of a separate $id might be work for us, though I think some JSON Schema-related tools do still make assumptions that URIs are network-reachable URLs.

MikeRalphson avatar Jan 11 '21 15:01 MikeRalphson

@notEthan

what you suggest looks like a confusing alternative to $ref

In 2019-09, $id was redefined as an embedded reference. They already are alternatives. This proposal doesn't introduce that property.

disallowing [fragments in $ids] resolved a confusing/undefined situation where a schema could identify itself as being at one pointer, while actually being at a different pointer in the resource or document.

I know the situation you're referring to. It is confusing, but it has never been undefined. That has always been an unambiguously invalid use of $id. Before 2019-09, it wasn't well defined what a URI with a fragment meant in an $id. In 2019-09, when $id was redefined to be an embedded reference, fragments were no longer undefined. It should be expected that they work the same as references. But, it was decided to disallow fragments anyway.

I think your example schema has an equivalent using $ref

Yep, that's another equivalent example. Allowing $id to have a fragment doesn't allow you to do anything that you can't already do. I only suggest it in order to simplify the conceptual model.

I see no upside to putting a fragment in $id, only confusion.

It makes for a simpler conceptual model. Fewer concepts with fewer exceptions means a simpler spec and implementations that are easier to write, have better performance, and are less bug prone. That's the main benefit. I agree that that benefit is marginal, but I'd rather not have exceptions to rules without a good reason, and don't see a good reason.

whether $id should allow an empty fragment, I think it should be disallowed at some point. it has remained for historical reasons

The same URI can be represented by multiple strings. It's the normalized URI that's important. For example, https://json-schema.org/draft/2020-12/schema and https://json-schema.org/draft/../draft/2020-12/schema are the same URI. I'll have to check the RFC to be 100% sure, but I'm pretty sure a schema with an empty hash normalizes to the hash being removed. If I'm right about that, I see no good reason to forbid an empty fragment. It doesn't hurt anything and it represents the same URI.

it's hard to imagine the value of such a thing

I agree, but I don't want to do extra work to forbid something that isn't causing any harm just in case someone comes up with a good use for it.

jdesrosiers avatar Jan 11 '21 20:01 jdesrosiers

I want to be clear that I opened this issue to discuss these constructs:

  • { "$id": "https://example.com#" } NOT { "$id": "https://example.com#/foo" } -- the latter is already disallowed
  • { "$schema": "https://example.com#" } or { "$schema": "https://example.com#/foo" } but note that the $schema keyword MUST already have an adjacent $id, so the uri in the $schema keyword is using a non-canonical URI when there is already a fragmentless URI available that would work just as well

$id with a json pointer or plain-name fragment makes no sense because this isn't a canonical URI. it's moving the goalposts to bring those possibilities back into play when there is no reason to allow them.

karenetheridge avatar Jan 12 '21 20:01 karenetheridge

First of all, sorry for hijacking this issue a bit, but I thought it was necessary to give an explanation for why I'd prefer to go a different direction. I'm happy to drop this if there's not support, but ...

it's moving the goalposts to bring those possibilities back into play when there is no reason to allow them.

this statement makes me think I haven't presented the proposal clearly. I'm not proposing bringing anything back. Every problem we have previously fixed will stay fixed. My proposal introduces something that never existed before. Originally, fragments in $ids were allowed but they had no semantics. Then fragments were forbidden. Now I propose fragments be allowed, but this time with defined semantics and those semantics would align with $ref.

$id with a json pointer or plain-name fragment makes no sense because this isn't a canonical URI.

I'm not following your point here, but URIs with fragments can be canonical.

Example from the spec ...

Consider the following schema document that contains another schema resource embedded within it:

{
  "$id": "https://example.com/foo",
  "items": {
    "$id": "https://example.com/bar",
    "additionalProperties": { }
  }
}

The URI "https://example.com/foo#/items/additionalProperties" points to the schema of the "additionalProperties" keyword in the embedded resource. The canonical URI of that schema, however, is "https://example.com/bar#/additionalProperties".

jdesrosiers avatar Jan 14 '21 02:01 jdesrosiers

I think modifications to the $schema and $id keywords would be best discussed in a new issue (and separate issues, as they are two very different things).

  • allowing $schema to handle fragments (either plain-name or json pointer) could possibly have a use (it is similar to $ref in that evaluation first resolves the URI, and then does something with the referenced document); ~~however this is currently prohibited by the spec (which mandates that $schema must appear at a resource root, i.e. an adjacent $id must be present, and I think there were good reasons for making that requirement~~ (edit: I mixed up the rules for where a $schema keyword can be located, and the content of its URI itself)
  • I did not follow your (Jason's) proposal for fragments in $id at all, either in how it would work or the motivation behind the proposed change, sorry

karenetheridge avatar Jan 14 '21 18:01 karenetheridge

I did not follow your (Jason's) proposal for fragments in $id at all

I'll drop it then. I might bring it up again in the future if I come up with a better way to explain it.

allowing $schema to handle fragments [...] is currently prohibited by the spec (which mandates that $schema must appear at a resource root, i.e. an adjacent $id must be present

What I had in mind doesn't alter the resource root, only the evaluation starting point (just like $ref). No matter where evaluation starts, the JSON Schema engine can always find the schema declaration at #/$schema.

{
  "$id": "https://example.com/my-schema",
  "$schema": "https://json-schema.org/draft/future/schema#strict"
}
{
  "$id": "https://json-schema.org/draft/future/schema",
  "$schema": "https://json-schema.org/draft/future/schema",

  ... standard schema ...

  "$defs": {
    "strict": {
      "$anchor": "strict"
      ... strict schema ...
    }
  }
}

Nothing has to change about where $schema is allowed. The fragment just says, start evaluating the schema at #strict rather than at #. It still finds the $id and $schema in the same place it always has.

jdesrosiers avatar Jan 15 '21 19:01 jdesrosiers

I'll have to check the RFC to be 100% sure, but I'm pretty sure a schema with an empty hash normalizes to the hash being removed. If I'm right about that, I see no good reason to forbid an empty fragment. It doesn't hurt anything and it represents the same URI.

I looked this up and I was wrong. Normalization does not remove an empty fragment.

The fragment component is not subject to any scheme-based normalization; thus, two URIs that differ only by the suffix "#" are considered different regardless of the scheme.

  • https://tools.ietf.org/html/rfc3986#section-6.2.3

jdesrosiers avatar Jan 15 '21 19:01 jdesrosiers

I revised my earlier comment - I mixed up the rules for where a $schema keyword can be, and the content of its URI itself, sorry! I think the spec is silent at the moment about whether a $schema URI can have a fragment.. so perhaps it is already possible today to have a schema containing e.g. "$schema": "https://example.com/metaschema.json#/properties/$defs/my_schema" ?

karenetheridge avatar Jan 15 '21 22:01 karenetheridge

I would be thrilled to see the empty fragment banned from both $id and $schema. Both should be non-fragment URIs. The empty fragment in $id was for historical compatibility, and in $schema to be able to match $ids again for historical compatibility.

@jdesrosiers the reason fragments do not have scheme-based normalization is that the fragment syntax and semantics are entirely defined by the media type. The URI scheme has nothing to do with fragments, nor does the RFC 3986 normalization process. For JSON Schema, an empty fragment and no fragment are the same, because we defined them to be that way. This is why the spec for $id says and MUST resolve to an absolute-URI because at some point we had a discussion about this and "resolve to" was the language people were most comfortable with. It is specifically separate from normalization which is in the previous sentence.

~~Re: non-empty fragments in $schema – people get confused enough with meta-schemas as it is. I really thought we had language for $schema to be an absolute URI (with scheme, without fragment) or equivalent to one (empty fragment, because of our media-type-specific fragment semantics). Please don't encourage people to do this!~~ EDIT: I still really don't like fragments in $schema but I see OAS has a use for it. idk why people don't like having multiple files as I split things up every chance I get, but there's always this lumper vs splitter thing that shows up in many contexts so 🤷

I would also really hate to see fragments returned to $id. I don't plan to do more here than drop in occasionally, so I know my comments don't carry much weight anymore and that's fine- I am thrilled to see work continue and don't want to disrupt that! But removing fragments from $id and making it sensible is something I consider one of my more significant achievements. EDIT: In terms of impact, I mean — y'all don't need to keep features solely because of my pride in them! It took years of incremental changes to get the point where we could split off fragments into $anchor and people were comfortable with that. $id was one of the reasons that OpenAPI would not support full compatibility (I don't remember if it was specifically the fragment part, but there were a lot of things wrong with what was then id when OAS 3.0 went out). The changes we made were necessary to get JSON Schema out in the world more broadly.

handrews avatar Apr 28 '21 04:04 handrews

disallowing [fragments in $ids] resolved a confusing/undefined situation where a schema could identify itself as being at one pointer, while actually being at a different pointer in the resource or document.

I know the situation you're referring to. It is confusing, but it has never been undefined. That has always been an unambiguously invalid use of $id.

It was definitely possible to do, and it was not clearly defined. I know because I did a very, very detailed implementation of id (pre-$!) and $ref as defined in draft-05 (which does not exist! 😉 ) and could not figure out how to handle several cases including that one. Should you allow both? What happens if someone does a same-document reference under an id with a JSON Pointer fragment where the same-document reference is also a JSON Pointer fragment, but one that would be impossible to co-exist with the one in the id? But that if resolved against the id base URI does point to a valid schema?

Once you started dealing with not only the behavior of the fragment in id but the base URI behavior and how that interacts with fragments in $ref it got really, really messy. It's the combination that really exposed the deeper problems.

handrews avatar Apr 28 '21 05:04 handrews

disallowing [fragments in $ids] resolved a confusing/undefined situation where a schema could identify itself as being at one pointer, while actually being at a different pointer in the resource or document.

I know the situation you're referring to. It is confusing, but it has never been undefined. That has always been an unambiguously invalid use of $id.

It was definitely possible to do, and it was not clearly defined.

Ever since draft-05, it's been defined what characters are allowed in id fragments and those characters do not include /. Therefore, JSON Pointers are not allowed in ids. That's why I say using JSON Pointers in $id fragments is an unambiguously invalid use of $id. I thought that was part of draft-04 as well, but I checked and it I was wrong. It was added in draft-05.

jdesrosiers avatar Apr 29 '21 18:04 jdesrosiers

@jdesrosiers if you mean this section of draft-05, specifically:

To name subschemas in a JSON Schema document, subschemas can use "id" to give themselves a document-local identifier. This form of "id" keyword MUST begin with

the key phrases are To name subschemas and This form of "id". That paragraph does not limit the fragment syntax in general, it just specifies how to define a plain name fragment. It's somewhat ambiguous, but it's more clear in intervening drafts.

Regardless, if you want to argue for reverting the $anchor change, that should be its own issue, which I will contest vigorously. We just fixed that, and it has been well-received among substantial groups of JSON Schema users, most notably OpenAPI.

handrews avatar May 05 '21 04:05 handrews

@handrews You're right. "id": "#/definitions/foo" has been unambiguously invalid since draft-05 and ambiguously invalid forever (not specified as invalid, but clearly not intended use). "id": "/schema/foo#/definitions/foo" is more ambiguous. I've always interpreted "id": "/schema/foo#bar" as equivalent to "id": "/schema/foo", "id": "#bar". In that case, the fragment would be subject to the document-local identifier rules. However, I forgot for a minute that that wasn't an official interpretation and just me filling in a gap in the spec. Thank you for the correction, although I'll point out that it doesn't change anything about my proposal. The proposal was regarding resolving that "confusing/undefined" situation in a different way.

if you want to argue for reverting the $anchor change

I can't imagine where you got the idea that I want to revert the $anchor change. While I wasn't the first to propose it, I was instrumental in introducing $anchor in the first place. I have no desire to revert that change.

(Let's not drag this out. I have no desire to argue for this any time soon.)

jdesrosiers avatar May 05 '21 17:05 jdesrosiers

@jdesrosiers To me, the point of $anchor was taking fragments out of $id entirely, and preserving only the valid use case, so putting fragments back in $id seems like reverting that. Are you considering it to be different because you are talking only about JSON Pointer fragments? Meaning that you would exclude plain name fragments from $id in order to keep $anchor?

I agree on not dragging this out, but perhaps if we just clarify one small point at a time that will feel like progress rather than a battle.

handrews avatar May 05 '21 19:05 handrews

@MikeRalphson continuing to think on this, my intuition is that it is somewhat important for a meta-schema to be a schema resource, and not just a schema object. But I'm not 100% and before diving into that topic, I wanted to check: you can solve this right now by just putting a $id in your meta-schema, even if it is nested in a larger document. Is there any reason why you can't just do that?

handrews avatar May 10 '21 16:05 handrews

you can solve this right now by just putting a $id in your meta-schema, even if it is nested in a larger document. Is there any reason why you can't just do that?

@handrews there's no reason it's impossible for us to do that, the only reason not to prefer that approach is that it means one more date to remember to update within the larger document every time we rev the meta-schema...

MikeRalphson avatar May 10 '21 17:05 MikeRalphson

@MikeRalphson hmm... changing this to avoid a fairly straightforward (if annoying) mechanical change for a relatively rare occurrence feels a little meh, although I might eventually be convinced.

I did remember why I think $schema should be a full resource, and there may be a path forward here. We currently restrict $vocabulary to the schema resource's root schema object. Which would mean that you couldn't use $vocabulary in a sub-resource meta-schema. Which would be a substantial problem.

Discussions in #1098 might let us relax that, although I'm not sure. $vocabulary as an annotation means it wouldn't be a problem to have in a non-root schema object that is applied to the root of a schema as a meta-schema. But if you had different values of $vocabulary getting applied to different parts of a schema, that would be a problem, and the simplest way to prevent that is to restrict $vocabulary (like $schema) to the root object.

This is a good example of how apparently-superficial changes actually have a lot of knock-on effects deeper in the system.

handrews avatar May 10 '21 18:05 handrews

I was about to file a new $id fragment-related issue and saw this, so I'll add it here for now, although perhaps it would be a good idea to file new separate issues for $id and $schema? The OpenAPI case means that there are some different considerations for $schema, although I am still in favor of it being an embedded resource rather than being fragment-addressable. (@MikeRalphson if you changed your schema URIs to end with something like /schema/2022-02-27/oas instead of /schema/2022-02-27 then you could use "$id": "meta" in your embedded meta-schema to produce /schema/2022-02-27/meta and avoid duplicating the date).

The new $id bug is that the spec wording technically allows "$id": "#" even in 2019-09 and 2020-12. For some reason I seem to have though this as OK, as I advocated for adding it to the test suite? I have no idea what I was thinking, because if you do "$id": "#" in an embedded resource then your embedded resource has the same URI as the context resource, which is bad, because now that URI is ambiguous. If you do it in the document root, it's a no-op (maybe that's what I was trying to test for? idk), which is what happens if you don't have an $id so there's no need for it.

If we drop fragments in $id altogether then this is solved, but if we keep them, it's a problem.

@jdesrosiers I just want to acknowledge that I misunderstood your proposal when I commented on it before and thought you were undoing the $anchor change. I have a different argument against it now 🙃 but I won't bring it up unless you re-propose the idea.

handrews avatar Aug 12 '22 04:08 handrews

I commented previously mostly on @jdesrosiers' alternative but I will comment here in favor of disallowing empty fragments in $id, for the same reasons @handrews states. (I did say above it should be disallowed at some point - now seems good.)

I am opposed to disallowing at least an empty fragment from $schema (it probably does warrant its own issue), since that is used for selection across specification versions, including specs whose metaschemas do have an empty fragment in their id. Using a non-empty fragment in $schema seems mostly like a poorer idea than just giving a metaschema an $id, worth strongly recommending against, but I don't know about forbidding that either. Changing the rules of $schema in any one specification is a bit fraught (as other currently-active discussions illustrate).

notEthan avatar Aug 13 '22 21:08 notEthan

I've filed issue #1292 for the "$schema" part, so I think if PR #1291 is accepted we can close this.

handrews avatar Sep 18 '22 22:09 handrews