json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

Backward compatibility

Open awwright opened this issue 2 years ago • 13 comments

At https://github.com/ietf-wg-httpapi/mediatypes/pull/43, the JSON Schema section currently specifies that the specification document may change based on certain factors. This is done for reverse compatibility, however this is in contrast to the behavior used by JSON Schema implementations (where old drafts have been obsoleted), and Internet media types in general (where a single specification must specify all of the essential behavior).

Historically, backwards compatibility has been the responsibility of each implementation. (Since a schema often is only processed by a single validator, that validator has the best idea on how to do this.) Newer publications replace older ones in their entirety; and deprecated behavior was simply removed, leaving it up to individual implementations to choose what would work best, since the behavior is now undefined.

This is not workable when backwards-compatibility is expected cross-platform, as in an Internet media type. And implementing every publication of JSON Schema is not sustainable, and was never intentional (a meta-schema is not a strong version identifier, and we offer no guidance as to which versions you would have to implement, or how to handle unknown future versions).

Some older behavior that has been removed, we can re-define in a new "deprecated functionality" section. Behavior has been removed for a variety of reasons, sometimes implementations disagreed on what the behavior would be and removing it was easiest, so this section cannot guarantee compatibility with all older schemas. However, text-compatibility with older publications should be possible.

A list of things to support

Let's make a list of cases where standardization of some behavior was dropped ("deprecated" or "un-defined"), and where it was changed ("broken"). Let's make sure this behavior is written into the spec, and with the goal of eliminating any need to reference to older versions of JSON Schema Core.

Deprecated core behavior

Behavior that has no reason to change based on $schema:

  • $id: "#foo" as an alias for $anchor: "foo"
  • "id" as an alias for "$id"

Deprecated validation keyword behavior

Behavior that could hypothetically change based on the value of $schema:

  • boolean forms of exclusiveMinimum, exclusiveMaximum
  • schema values in "type"

Breakage:

Behavior where the same schema would be required to produce a different result depending on the release of JSON Schema being read:

  • Keywords next to $ref changing from ignored to processed (this was deliberate and does not need to be supported: nobody was using validation keywords next to $ref expecting them to be ignored)
  • 1.0 being an integer: type: "integer" would previously validate false, now true (I'm unaware of any problems caused by this change in the wild: only a minority of implementations ever followed the older behavior, and of those, they rarely encoded 1.0 where the datatype wasn't specifically known to be a floating point)

awwright avatar Jun 08 '22 01:06 awwright

I think we should get consensus on taking this approach before putting too much effort into this.

jdesrosiers avatar Jun 09 '22 16:06 jdesrosiers

The first step of this issue is gathering feedback, and to do it away from https://github.com/ietf-wg-httpapi/mediatypes/pull/43 because, as you noted, it was getting a bit off topic.

Specifically, the first thing is to list backwards-compatibility-related issues that impact implementations. Once we're satisfied we have a comprehensive list, then we can examine how to solve them. I don't mean to imply a specific solution yet. I know you're busy, we can take our time on this.

awwright avatar Jun 09 '22 23:06 awwright

I object to the premise of requiring implementations to simultaneously support the behavior of multiple existing drafts, even which such a thing is possible (e.g. handling "$id": "#foo" as an alias for "$anchor": "foo", which defeats the entire purpose of that change which was to simplify and clarify things).

We have been publishing drafts. I know some folks like to say that we aren't really publishing drafts because JSON Schema has such broad production use, but publishing IETF I-Ds means something, and we have chosen to continue doing so even as recently as a couple of months ago.

I don't know about the rest of you, but I have been writing specification text and considering changes with this meaning in mind.

Once we decide to publish something that is not officially designated as a draft, then I am in favor of determining a compatibility strategy from that point forward.

Implementations now support multiple drafts, and switch among them based on internal ($schema) or external configuration. I do not recall a single request during all the years I've been involved here for implementations to support some sort of conflation across all possible interpretations of all drafts. This is not a problem that needs to be solved.

Recommending such a thing would be a hugely burdensome requirement for implementations, and a source of massive confusion for schema authors, encouraging chimeric schemas that would be difficult to read.

Preserving the behavior of even all of 2020-12 would require preserving some deeply problematic behaviors (minContains un-failing contains comes to mind) that the JSON Schema ecosystem would benefit from emphatically dropping and outright forbidding.

We should not do this. We need to investigate the behaviors that we have, remove the problematic ones in our next publication, and establish a new compatibility strategy from that point. For what we have today, the JSON Schema community has settled on an approach which has worked just fine for many years.

@Relequestual I would like to see an actual decision on this documented in an ADR as soon as possible, because this idea has complicated multiple related discussions and is making it hard to move forward.

Again, I do not object to talking about this sort of approach for the future, but I strongly object to it for the present and the past.

handrews avatar Aug 14 '22 00:08 handrews

we offer no guidance as to which versions you would have to implement

This is being addressed in discussions 192 and 209.

handrews avatar Aug 14 '22 00:08 handrews

Agreed entirely, not a problem that should be addressed for current or past drafts, and not in the future until the spec has found solid stability. Behavior that has become undefined should stay undefined - $id: "#foo" is a MUST NOT and shouldn't be anything else. No optional deprecated implementation blurring the lines between drafts which are not, and have not intended to be, backwards compatible.

notEthan avatar Aug 14 '22 04:08 notEthan

No optional deprecated implementation blurring the lines between drafts

There is some stuff where we tried to discourage people from re-defining keywords that we removed or renamed entirely, as is being discussed in #1265. But we did not make a vocabulary of those keywords so we were not trying to get people to keep supporting them, just trying to make sure people didn't immediately produce contradictory definitions. Which is very different from supporting old behaviors of keywords that are still present with revised or reduced functionality, as is being proposed here.

handrews avatar Aug 14 '22 04:08 handrews

I would like to see an actual decision on this documented in an ADR as soon as possible

I agree and I propose we make that decision at the next OCWM. I think there are actually three issues so I want to be clear what we are deciding on.

  1. Implementations can support features from a release (past or future) other that from the one indicated by $schema or other dialect declaration other mechanism supported by the implementation.
  2. We should re-add old features (as deprecated) to a future release to make it as compatible as possible with past releases.
  3. In some future release we should commit to no backward incompatible changes.

I think (1) is what the ADR should cover. Although there is one dissenting opinion, there's pretty strong agreement that the answer to this should be "no". I think we're pretty close to rejecting (2) as well, but I think that can be a separate ADR when we get there. I think there's pretty strong support for (3), but we can address that when we have a clear plan on how to make that happen.

I'd be willing to write the ADR.

jdesrosiers avatar Aug 17 '22 01:08 jdesrosiers

@jdesrosiers Sounds good to me. I'd offer one bit of nuance on the "no" on (1): Implementations are free to implement compatibility vocabularies, which can be enabled with $vocabulary. So you could have a 2020-12 core vocabulary (which is the one that matters for what release we're talking about), but if you really want to you could replace the 2020-12 applicator vocabulary with a draft-07 one. Or one that does some weird quantum superposition magic multi-draft compatible thing.

But you can't have mixed support on by default, or on when it is not specifically requested via the $vocabulary system, and you can't interpret a lack of $schema as a mixture of all past possible keywords.

(hopefully someone can make that all more concise)

There's probably language needed to avoid problems with non-vocabulary extension keywords so that we don't have a repeat of json-schema-org/JSON-Schema-Test-Suite#574, although we should do that more generally so I'll file a separate issue or discussion on it soon.

handrews avatar Aug 17 '22 05:08 handrews

@handrews I agree with all of that. I was trying to keep the description to a bullet point so not all of the nuance is there.

jdesrosiers avatar Aug 17 '22 18:08 jdesrosiers

Everyone, I'm just looking for a list of backwards-compatibility-related issues.

I understand there's many ways we could accommodate backwards compatibility, or bad ways, or reasons not to at all. But to even begin talking about that, we have to make a list of things so we can have some sort of rubric to judge proposals by.

awwright avatar Aug 19 '22 22:08 awwright

@awwright it's hard to contemplate proposals when we don't agree on what the problem is, or even if there is a problem. To me, there is no problem with any lack of compatibility between existing drafts, or how most implementations handle it (some combination of needing to be told the draft and/or reading $schema/$vocabulary).

handrews avatar Aug 20 '22 00:08 handrews

Since my repeated requests for concrete, real-world concerns around this topic have not yielded any such examples, I went out and found one.

Webpack uses a schema that does not list $schema and (as best I can tell) probably works under draft-04+, although for a few implementations that are strict about $defs vs definitions, it would need to run those in compatibility mode.

Notably, $defs/definitions is the one place where we at least attempted to address compatibility and migration, which has resulted in some implementations offering compatibility for those.

So, can we think about what Webpack needs here, and how we would want that to work? Some useful questions might be:

  • Is Webpack attempting to avoid specifying a draft in order to maximize compatibility? (I have not looked into this at all)
  • How has the $defs/definitions change affected them, if at all?
  • What impact would there be if another keyword that they rely on had a breaking change?
  • How would they prefer to see such an impact mitigated?
  • What options might exist to support mitigation?

Note that there are options beyond "always use $schema and follow it strictly" (which would require multiple schemas to support multiple drafts) or "avoid $schema to allow broader compatibility". Specifically, we could introduce something that indicated a draft/processing rules range with which a schema is compliant. That's not a thought-out proposal, btw. I'm just trying to connect this conversation to something real, and consider that what actual JSON Schema users need may not be any of the things that we have discussed so far.

handrews avatar Aug 21 '22 18:08 handrews

I looked into this a bit. These schemas (even third-party plugins) are only used internally and validated using a single known implementation. That implementation uses AJV, which defaults to draft-07. They haven't configured anything otherwise, so they are getting draft-07 validation.

Because the validator implementation is constant and known, there is no JSON Schema version ambiguity. In this set of circumstances, they can leave out the $schema declaration without any ambiguity. It would be nice if they document that it's draft-07, but I couldn't find anywhere where it does. What they do do is provide a package called schema-utils and tell plugins to test their schemas against that library because that's what webpack will use to evaluate those schemas.

Therefore, this isn't a case of someone leaving out $schema and expecting cross-draft support of some kind.

jdesrosiers avatar Aug 22 '22 18:08 jdesrosiers