json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

Drop the concept of an "initial base URI" from the spec

Open jdesrosiers opened this issue 3 years ago • 8 comments

The spec references RFC-3986 for how to determine the base URI for relative reference resolution. It also has a concept of an "initial Base URI". This concept unnecessarily complicates the model and should be removed. The concepts of retrieval URI and default base URI as defined in RFC-3986 are enough.

The initial base URI can be viewed as essentially the retrieval URI and the default base URI compressed into one step and determined by the user. Presenting these as one step is at best confusing, especially if we want to encourage implementers to provide more robust schema identification features such as was proposed in #1299. For example, it makes sense to allow a user to set a default base URI as a global configuration and a retrieval URI per schema. Some implementations may allow you to set a retrieval URI, but not a default base URI, or vice versa.

Additionally, an "initial base URI" can be ambiguous. If it's a retrieval URI, a user should expect the schema to be referenceable by that URI. If it's a default base URI, the user shouldn't expect it to be referenceable by that URI. If the implementation only takes an "initial base URI", it can't know the difference.

It could be argued that the "initial base URI" concept is still there in these more robust cases, it's just calculated within the implementation rather than as a single input by the user. However, it's unclear in the spec that that's a choice implementers have let alone something they should do. If we stick to RFC-3986 concepts, it's more clear to implementers what they are allowed to do and what they should be doing it their implementations.

Reference: https://json-schema.org/draft/2020-12/json-schema-core.html#section-9.1.1

jdesrosiers avatar Oct 06 '22 23:10 jdesrosiers

Are you really trying to get rid of a certain amount of redundant text in the spec, or is this really about the word "initial"?

I've stated before that I support getting rid of redundant text.

handrews avatar Oct 06 '22 23:10 handrews

BTW I am also fine with getting rid of the word "initial" (although I remain puzzled by the objection), I just see this as two separate things.

handrews avatar Oct 06 '22 23:10 handrews

Are you really trying to get rid of a certain amount of redundant text in the spec, or is this really about the word "initial"?

The problem can be solved in two ways. One way is to remove the redundant text and just reference RFC-3986. The other way is to update the language to only use RRC-3986 terminology. I prefer the first solution, but I would accept the second.

The problem isn't the word "initial", it's using the concept of an "initial base URI" rather than using the RFC-3986 concepts. It doesn't matter what word you use.

jdesrosiers avatar Oct 07 '22 16:10 jdesrosiers

The problem isn't the word "initial", it's using the concept of an "initial base URI" rather than using the RFC-3986 concepts. It doesn't matter what word you use.

I do not see any contradictions of RFC 3986 in our §9.1.1. However, it is mostly redundant. Everything except the first paragraph/sentence can be removed, and replaced with the requirement regarding JSON Schema implementations per issue #1299.

The point of §9.1.1 is to ensure that JSON Schema implementations accommodate RFC 3986 requirements. That's why I filed #1299, because the current section doesn't actually accomplish this in a clear enough way to result in implementations doing the right thing consistently.

handrews avatar Oct 07 '22 16:10 handrews

Everything except the first paragraph/sentence can be removed, and replaced with the requirement regarding JSON Schema implementations per issue https://github.com/json-schema-org/json-schema-spec/issues/1299.

Agreed, but if a requirement regarding implementations is added, it needs to address retrieval URIs and default base URIs separately, not initial base URIs.

jdesrosiers avatar Oct 07 '22 17:10 jdesrosiers

Agreed, but if a requirement regarding implementations is added, it needs to address retrieval URIs and default base URIs separately, not initial base URIs.

Why? Why should we re-state RFC 3986? You're also missing the encapsulated entity URI case. But why should JSON Schema implementations handle these separately and re-implement RFC 3986's precedence logic? Something external to the JSON Schema implementation ought to handle that.

Most importantly, there are many use cases for a non-RFC3986-compliant base URI. RFC 3986 tells us how to determine the base URI correctly, but sometimes what is technically incorrect is more appropriate. A set of schemas that can be hosted at different locations (because it is part of a system that is expected to run behind firewalls and therefore not have a globally accessible URL) will have relative $ids. During development and testing, it would make more sense to load them from a filesystem, which means the retrieval URIs would be file:// URIs. But instead we want https:// URIs because that's what will be used in production. So we override RFC 3986 and supply the base URI for the test environment we need.

Where would that go? It's not from within content (5.1.1). It's not from an encapsulating entity (5.1.2). It's not the retrieval URI (5.1.3). I'd argue it's not an application-specific default URI (5.1.4), but even if you want to treat a test suite as an application to make that work, we'd still be overriding 5.1.3 to use it.

That is a critically important use case, and it requires us to be agnostic about where the base URI came from, and assume that the caller either followed RFC 3986 or had a good reason to disregard it. We don't care. That's not our problem. Our problem is enabling the use cases that JSON Schema needs. And that requires allowing overriding RFC 3986 if there is a reason to do so.

handrews avatar Oct 07 '22 17:10 handrews

I'm marking my previous comment as off-topic as the essentials are already covered in #1299, and I need to respect my own partition of this topic that I demanded in that issue. I apologize for re-muddling it.

@jdesrosiers if it sounds good to you, I am content to consider this issue settled with:

  • An agreement to remove everything but the first paragraph/sentence if there needs to be a section about this to address implementation requirements per #1299.
  • Support on my part for removing the entire section if we do not add implementation requirements (as long as something somewhere indicates that relative references in a document root $id, or in a document without an $id in the document root, are resolved according to RFC 3986 §5.1. This is to make clear that there are no other mechanisms in JSON Schema that are involved in such cases. It might already be stated elsewhere, I'm not sure, but if we're down to one sentence and you want to find another home for it than its own section, I'm fine with that.

handrews avatar Oct 08 '22 19:10 handrews

@jdesrosiers Sorry for piling on an old thread but I'm a bit confused. From my reading of this issue and #1299 one of the goals is to encourage library developers to provide a way for applications to supply a base URI that might have been the Retrieval URI. That seems like a different thing than Default Base URI as defined by RFC3986 because the Retrieval URI overrides Default Base URI. It seems to me that RFC3986 does not have any provision for allowing the application context to provide the "Retrieval URI".

Are you suggesting that implementations should provide a way for the application to set both the Default Base URI and the Retrieval URI and then have the implementation do the resolution between those? My understanding of Henry's suggestion was that the implementation could accept just one URI called 'Initial Base URI' and assume that the application did the resolution between Retrieval URI and Default Base URI.

I guess the other alternative is to assume that JSON Schema doesn't really care what the Retrieval URI is and accept the fact that Default Base URI provided by the application context might actually be a Retrieval URI.

darrelmiller avatar Jan 15 '23 17:01 darrelmiller