json-schema-spec
json-schema-spec copied to clipboard
Proposed Addition to JSON Schema: "Else-If"
When defining mutually exclusive if-statements (if
, else-if
, else-if
, etc.), JSON Schema currently requires nesting with the else
statements. For shallow cases, this works fairly well. For example:
{
"if": { ... },
"then": { ... },
"else": {
"if": { ... },
"then": { ... },
"else": {
"if": { ... },
"then": { ... },
"else": { ... }
}
}
}
However, for highly complex use-case where there are hundreds of mutually exclusive conditions, this results in nested if-then-else
statements more than 100 deep. This complicates use-cases where clients are using JSON Schema for non-validation use-cases (e.g. mapping rules to their own data models from our JSON Schemas) as well as causes issues with commonly uses open-source validator implementations (~100 is a commonly used "safety" limit in some open source libraries to prevent infinite recursion).
Since conditional logic (if-then-else
) is defined by standard JSON Schema vocabularies, it would be ideal if we could come up with a way to flatten else-if
behaviors in the standard JSON Schema vocabularies (rather than using custom vocabularies that most open-source implementations would not be able to handle by default). Something like this:
{
"if": { ... },
"then": { ... },
"elseIf": [
{
"if": { ... },
"then": { ... }
},
{
"if": { ... },
"then": { ... }
}
],
"else": { ... }
}
Thanks for your consideration.
Thanks for the suggestion. There's definitely a gap here, but I think we'd probably need a different solution than the one suggested.
In case it isn't known, the workaround you can use today to flatten your conditionals is to put them in an allOf
.
{
"allOf": [
{
"if": { ... },
"then": { ... }
},
{
"if": { ... },
"then": { ... }
},
...
]
}
Each schema will pass if the if
fails or the if
passes and the then
passes, so you can easily add as many conditionals as you want without nesting. Of course this isn't exactly the same as nesting with else
. Every if
will run regardless of what happens with other if
s. This can sometimes mean more complex if
s and it definitely means the evaluation can't short circuit when the first match is found.
So, while this workaround allows you to express what you need to express without excessive nesting, it would be nice to have a more ideal solution that doesn't require nesting.
Back to the suggested elseIf
keyword...
The if
/then
/else
keywords are just keywords that are part of a generic schema. The elseIf
keyword seems to require and expect the if
and then
to be present and wouldn't make sense if there were other keywords in those schemas. So, items in the elseIf
would have to not be schemas, but rather a special construct specific to the elseIf
keyword. This would be awkward because there is nothing like that in JSON Schema and because it looks like a schema, but isn't.
I'm not really sure how to reconcile these problems. I thought about it for a bit and here's the best I could come up with so far. I'll call this keyword "conditional
". It's an array whose items are schemas. The elements are logically pairs of schemas. The first pair represents if
and then
. Any following pairs represent elseIf
and then
. If there's an odd number of schemas, the remaining schema represents else
.
{
"conditional": [
{ ... }, // if
{ ... }, // then
{ ... }, // elsif
{ ... }, // then
{ ... }, // elsif
{ ... }, // then
{ ... } // else
]
}
The biggest problem with this solution is that it's not very readable/maintainable. Two or three elements is fine, but once you start getting into the elseIf
s, it can get hard to keep track. $comment
s can help with that, but I'd be reluctant to introduce a feature that's confusing enough that it practically requires the use of $comment
s to be maintainable.
Does anyOf
do short-circuiting? If not, could we propose a firstOf
that evaluates the conditional expressions in order until one evaluates true?
anyOf
can short-circuit if annotations aren't being collected, however there's a problem with using it: because none of the subschemas have an else
, they'll all pass if the if
doesn't match.
{
"anyOf": [
{ "if": false, "then": { "type": "object" } },
{ "if": false, "then": { "type": "string" } }
]
}
The instance 42
passes this schema because none of the if
s pass, so none of the then
s are invoked.
If you put an else: false
on all of them, then I imagine you could probably do an anyOf
, but that seems tedious.
{
"anyOf": [
{ "if": false, "then": { "type": "object" }, "else": false },
{ "if": false, "then": { "type": "string" }, "else": false }
]
}
Here, only objects and strings are allowed.
If you put an
else: false
on all of them, then I imagine you could probably do ananyOf
, but that seems tedious.
Unfortunately, that doesn't work. The anyOf
schemas would fall through if the if
fails (which is what we want), but it would also fall through if the if
passes and the then
fails (which is not what we want). We want it to only try the next schema if the if
fails. If a then
fails in any of the schemas, we need evaluation to stop and report failure, not try the next one. anyOf
can't do that. Not even one that can guarantee short circuiting.
I am trying to think if there is some combination of nested anyOf
, allOf
, and literal booleans that
- Evaluates one condition per clause
- Does not continue to evaluate once a condition is true (e.g. value of clause is not used for fallthrough)
It might not end up extremely readable, but if it's already possible then maybe we can put some syntax sugar on top
Unfortunately I'll have to come back to this tomorrow
The goal here is to have either 0 or 1 (first applicable) condition in a list of conditions apply, making each condition mutually exclusive, but not requiring any of them. Flattening to an anyOf
(with else-false) does simplify them, but does not make them mutually exclusive. So far, the only means I have found to make a list of conditions mutually exclusive is to nest each next condition in the else statement of the previous one. :-(
Seconding the usefulness of this feature. I agree that a shortcircuited version of allOf (which would mostly be used for conditionals or code generators) would be the most idiomatic representation of this.
After more thought, the only way I can imagine to make anyOf
work is to repeat the previous conditions in later conditionals, so I'm not considering that anymore.
My proposal would be something like this
{
"firstOf": [
{"if": "PREDICATE_1", "then": "EXCLUSIVE_RESULT_1"},
{"if": "PREDICATE_2", "then": "EXCLUSIVE_RESULT_2"}
]
}
In this example, if PREDICATE_1
is true, then the result of firstOf
will be EXCLUSIVE_RESULT_1
. If PREDICATE_1
is false and PREDICATE_2
is true, then the result of firstOf
will be EXCLUSIVE_RESULT_2
. If both PREDICATE_1
and PREDICATE_2
are false, then the result of firstOf
will be false. In the firstOf
grammar, none of the nested conditionals may contain an else
because that is implicit (not sure if this is possible)
We should also define the base case:
{
"firstOf": []
}
Although I personally have no preference if this returns false, true, or doesn't parse.
The firstOf
idea looks promising syntactically. How would it behave though? With oneOf
, for example, one entire nested schema must be valid. If that worked the same way here, both the if
and then
statements of the first schema element would need to evaluate true in order to prevent moving to the second item in the list.
{
"firstOf": [
{"if": "PREDICATE_1", "then": "EXCLUSIVE_RESULT_1"},
{"if": "PREDICATE_2", "then": "EXCLUSIVE_RESULT_2"}
]
}
With this example, if PREDICATE_1
evaluated true but EXCLUSIVE_RESULT_1
evaluated false, validators would move on to the second element in the list (PREDICATE_2
) rather than producing a validation message that EXCLUSIVE_RESULT_1
did not evaluate true. In order for this to work, firstOf
would need to have more limited vocabulary (meaning it could only contain if/then/else keywords on the first-level elements it contains) and would need different evaluation rules defined for validators. The downside there being that it behaves materially differently from anyOf
, allOf
, and oneOf
in how the nested schemas are evaluated.
I think I would suggest switch
or select
since those are the common words used in programming languages for what we're considering.
That said, this is a very niche application of an anyOf
-like thing, and I'd prefer to have something more generic.
- What would happen if someone used a
firstOf
with subschemas that aren'tif
/then
constructs? - How does this keyword affect annotation collection?
- [Probably more questions]
the only way I can imagine to make anyOf work is to repeat the previous conditions in later conditionals
Agreed.
I agree that a shortcircuited version of allOf [...] would be the most idiomatic representation of this.
Short circuiting alone (whether anyOf
or allOf
) is not a solution to this problem. See, https://github.com/json-schema-org/json-schema-spec/issues/1410#issuecomment-1577627941
My proposal would be something like this
This appears to be the same as the original proposal for elseIf
except elseIf
is now spelled firstOf
. The problems with that proposal don't seem to be solved here.
@chapmanjw
The downside there being that it behaves materially differently from anyOf, allOf, and oneOf in how the nested schemas are evaluated.
Now I'm worried that I misunderstood the feature request. Isn't this statement also true about else-if? If the issue is naming, I think I prefer condition
switch
or select
versus firstOf
after seeing feedback
@gregsdennis
I would suggest
switch
orselect
Agreed, I chose firstOf
because I thought it was similar to anyOf
, but it's not so these are good suggestions
What would happen if someone used a firstOf with subschemas that aren't if/then constructs?
Maybe it would evaluate them in order until one was true and return it directly, or maybe it's not a valid schema? I'm not sure
How does this keyword affect annotation collection?
Sorry, I don't know and can't answer this one
@jdesrosiers
This appears to be the same as the original proposal for elseIf except elseIf is now spelled firstOf. The problems with that proposal don't seem to be solved here.
One substantial difference is that it does not require if
at the same level. Looking at your comment, I just see one core problem (apologies if I combined them erroneously):
This would be awkward because there is nothing like that in JSON Schema and because it looks like a schema, but isn't.
I think you really highlighted the core of the problem. We're trying to fit an imperative tool into a functional box. Lisp has a cond
which is very similar to your proposed conditional
, except it does support 2-tuples of if-then constructs:
(cond (test-expression1 then-expression1)
(test-expression2 then-expression2)
(t else-expression2))
(source)
On that note, would using an array of arrays of schemas be any better?
{
"conditional": [
[{ ... }, { ... }], // if, then
[{ ... }, { ... }], // elsif, then
[{ ... }, { ... }], // elsif then
[{ ... }] // else
]
}
If the firstOf
idea dies brutally, but it still gets us closer to a solution that's fine with me. Thank you for entertaining the thought and poking holes in it
Maybe it would evaluate them in order until one was true and return it directly...
We're trying to fit an imperative tool into a functional box.
This entire issue/discussion is one of the hesitations I had about introducing if
/then
/else
in the first place. JSON Schema is not a programming language; it doesn't have a "program flow." At its core, JSON Schema is nothing more than a collection of constraints. However, these keywords tend to make it feel like it has flow, and I think that's what lead to this proposal.
- It's perfectly valid to have any of these keywords present on their own.
-
if
doesn't actually do anything (besides any annotation behavior). -
then
is evaluated only ifif
is present and validates successfully (valid == true
). -
else
is evaluated only ifif
is present and validates unsuccessfully (valid == false
).
Yes, an implementation has to process if
to know which of then
or else
to process, but it's not done like you'd typically think of your code running an if-statement. if
is considered more of a dependency of the then
and else
constraints rather than the three being a single logical statement (as they are in programming languages).
else-if
and the other suggestions here necessarily imply sequential processing, that "single logical statement," and I think that's what doesn't fit with the rest of JSON Schema.
For allOf
, anyOf
, and all of the other multiple-schema applicator keywords, their children can be evaluated in any order, even in parallel.
We have no basis for a keyword that applies subschemas sequentially, especially one that bails out part-way through.
You may ask, "What about prefixItems
? Doesn't that apply the subschemas in order?" Not necessarily. It applies subschemas to the same index in the instance, but they don't need to be evaluated in that order. As with the others, you can evaluate them reversed, at random, or in parallel, and you'll still get the same evaluation result.
Looking around to see what others have done in other schema formats. Here is an example of XSD-based modeling to express these concepts:
- https://ddialliance.org/Specification/DDI-Lifecycle/3.2/XMLSchema/FieldLevelDocumentation/schemas/datacollection_xsd/complexTypes/IfThenElseType.html
- https://ddialliance.org/Specification/DDI-Lifecycle/3.2/XMLSchema/FieldLevelDocumentation/schemas/datacollection_xsd/complexTypes/ElseIfType.html
Following a similar pattern could bring us back to the example in the first post:
{
"if": { ... },
"then": { ... },
"elseIf": [
{
"if": { ... },
"then": { ... }
},
{
"if": { ... },
"then": { ... }
}
],
"else": { ... }
}
The schema for elseIf
would need to explicitly be an array of objects containing only if
and then
properties. The applicator meta-schema could look like this:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://json-schema.org/draft/2020-12/meta/applicator",
"$vocabulary": {
"https://json-schema.org/draft/2020-12/vocab/applicator": true
},
"$dynamicAnchor": "meta",
"title": "Applicator vocabulary meta-schema",
"type": ["object", "boolean"],
"properties": {
"prefixItems": { "$ref": "#/$defs/schemaArray" },
"items": { "$dynamicRef": "#meta" },
"contains": { "$dynamicRef": "#meta" },
"additionalProperties": { "$dynamicRef": "#meta" },
"properties": {
"type": "object",
"additionalProperties": { "$dynamicRef": "#meta" },
"default": {}
},
"patternProperties": {
"type": "object",
"additionalProperties": { "$dynamicRef": "#meta" },
"propertyNames": { "format": "regex" },
"default": {}
},
"dependentSchemas": {
"type": "object",
"additionalProperties": { "$dynamicRef": "#meta" },
"default": {}
},
"propertyNames": { "$dynamicRef": "#meta" },
"if": { "$dynamicRef": "#meta" },
"then": { "$dynamicRef": "#meta" },
"elseIf": { "$ref": "#/$defs/elseIfArray" },
"else": { "$dynamicRef": "#meta" },
"allOf": { "$ref": "#/$defs/schemaArray" },
"anyOf": { "$ref": "#/$defs/schemaArray" },
"oneOf": { "$ref": "#/$defs/schemaArray" },
"not": { "$dynamicRef": "#meta" }
},
"$defs": {
"schemaArray": {
"type": "array",
"minItems": 1,
"items": { "$dynamicRef": "#meta" }
},
"elseIfArray": {
"type": "array",
"minItems": 1,
"items": { "$ref": "#/$defs/elseIfCondition" }
},
"elseIfCondition": {
"type": "object",
"properties": {
"if": { "$dynamicRef": "#meta" },
"then": { "$dynamicRef": "#meta" }
},
"required": [ "if", "then" ]
}
}
}
This is still subverting some core assumptions about JSON schema:
- The construct
elseIfCondition
looks like a schema but it isn't (there isn't anything else in JSON schema like it) - The
if
andthen
construct properties that we are asking for are functionally very different from the existingif
andthen
keywords - The
elseIfArray
needs to be evaluated in order (which isn't done anywhere else in JSON schema)
Even conditional
fixes 1 and 2 but not 3
We're trying to fit an imperative tool into a functional box.
if
/then
/else
is a bit awkward in JSON Schema, but its design is based on the same boolean logic underpinnings that JSON Schema is based on. if
/then
is equivalent to boolean implication (A -> B
), which is equivalent to !A || B
, which can be expressed in JSON Schema as anyOf: [{ not: A }, B]
. So, if
/then
in JSON Schema isn't an imperative tool, but in conversations like this we often conflate it with the imperative tool. That's why suggestions like adding a elseIf
seem straightforward, but actually aren't.
Lisp has a
cond
which is very similar to your proposedconditional
, except it does support 2-tuples of if-then constructs
I thought about pairing them as tuples as well, but decided against it in the moment. I think there are times when it's best expressed as tuples and times when it's not. In a simple if
/then
(/else
), the pairing is unnecessary and the extra syntax would be annoying, but when you start getting into the elseIf
s, it can help with readability to have them grouped. Perhaps it would be best to allow both forms?
We have no basis for a keyword that applies subschemas sequentially, especially one that bails out part-way through.
While it's true that there aren't currently any keywords that work that way, I don't see any reason why a keyword like that would be a problem. It doesn't introduce the need for any new capabilities to the JSON Schema architecture.
It might be relevant to point out the propertyDependencies
keyword that's expected to be included in a future release of JSON Schema.
{
"propertyDependencies": {
"foo": {
"a": { ... },
"b": { ... },
"c": { ... },
...
}
}
}
This keyword defines a schema to apply if a property has a given value. In this case, if the value of property "foo"
is "a"
it applies one schema, if it's "b"
it applies another schema, etc. This would allow O(1) selection of the mutually exclusive option, which is even better than O(n) you would have with an else-if chain. Although efficient and concise, this solution is limited to being able to match on constant string values of a single property. If you need something more expressive than that, this wouldn't solve your problem and you'd be back to nested if
/then
/else
.
This
if A then W
else if B then X
else if C then Y
else Z
is equivalent to this
if A then W
if !A and B then X
if !A and !B and C then Y
if !A and !B and !C then Z
which is pure boolean logic and fully compatible with anyOf
. We just don't want to write the second block by hand (it's strictly worse than just using nesting). To achieve it still requires either:
- a sequential preprocessing step which appends the complement of any earlier predicates, or
- the ability for
then
to "look back" at earlier predicates and determine their complement
p.s. if it simplifies implementation, this is also equivalent:
if !(false) and A then W
if !(false or A) and B then X
if !(false or A or B) and C then Y
if !(false or A or B or C) then Z
this way, some "union" exists and can be dropped into !()
to make the complement
propertyDependencies
seems to neatly solve the switch
version already, since C
implies !A and !B
. Maybe that's good enough already for a lot of cases
The schema for
elseIf
would need to explicitly be an array of objects containing onlyif
andthen
properties.
We've avoided structured keywords like this in the past. That's the main reason if
/then
/else
are three separate keywords rather than one keyword with three components named if
, then
, and else
. Personally, I don't think that avoidance is necessary and I don't have a problem with it, but that has historically been a minority opinion so I don't see it getting broad support.
Although a structured keyword would be a unique addition to JSON Schema, that's not my concern with this proposal. My concern is that because if
/then
keywords already exist as schema keywords with slightly different behavior, using the same names within a structured keyword would be too confusing (it looks like a schema, but it's not). If different (but still good) names could be found, I'd feel better about the proposal.
Interestingly, since names are the problem, if we convert the structured if
/then
into a tuple (no names needed), we pretty much end up with @Exekyel's variation of the conditional
keyword I suggested :smile:.
if A then W if !A and B then X if !A and !B and C then Y if !A and !B and !C then Z
I think using references might help this a little so that you're not copying constraints everywhere.
{
"$defs": {
"A": { ... },
"B": { ... },
"C": { ... },
},
"allOf": [
{
"if": { "$ref": "#/$defs/A" },
"then": { ... } // W
},
{
"if": {
"allOf": [
{ "not": { "$ref": "#/$defs/A" } },
{ "$ref": "#/$defs/B" }
]
},
"then": { ... } // X
},
{
"if": {
"allOf": [
{ "not": { "$ref": "#/$defs/A" } },
{ "not": { "$ref": "#/$defs/B" } },
{ "$ref": "#/$defs/C" }
]
},
"then": { ... } // Y
},
{
"if": {
"allOf": [
{ "not": { "$ref": "#/$defs/A" } },
{ "not": { "$ref": "#/$defs/B" } },
{ "not": { "$ref": "#/$defs/C" } }
]
},
"then": { ... } // Z
},
]
}
While this makes it a bit easier to read, unless your implementation is doing a really good job of caching results, you still have multiple evaluations for each definition as it still has to evaluate all of the options.