json-schema-spec
json-schema-spec copied to clipboard
Add requireAllExcept keyword
Resolves https://github.com/json-schema-org/json-schema-spec/issues/1112
Adds the requireAllExcept keyword.
I don't think we do that anywhere else, so it would be a little out of place.
Fair enough!
I find this keyword somewhat difficult to comprehend. It's behavior is defined in terms of another keyword, which I think we should avoid. It becomes non-intuitive when mixing "allOf", because this keyword does not "see into" sub-schemas.
What would be some cases where this keyword is superior to "requiredProperties"?
I find this keyword somewhat difficult to comprehend. It's behavior is defined in terms of another keyword, which I think we should avoid. It becomes non-intuitive when mixing "allOf", because this keyword does not "see into" sub-schemas.
I don't feel this is a problem. Many keywords have dependencies. That's one of the reason for the annotations system.
What would be some cases where this keyword is superior to "requiredProperties"?
@awwright Earlier you said...
The problem would be if keywords do multiple things and there's no way around it.
By my understanding of your proposal #846, schema authors could use both properties and requiredProperties to define properties. requiredProperties would be doing multiple things... defining subschemas to apply, and specifying they are required.
Generally we've moved away from keywords doing multiple things, splitting them up, mainly because it was confusing. requiredProperties as you proposed in #846 feels like a step backwards.
@Relequestual By "... and there's no way around it", I mean: It's fine if a keyword performs multiple things as an author convenience. But we have to be able to break it apart.
e.g. {"type": "integer"} is short for { "type": "number", "multipleOf": 1 }.
Likewise, {"requiredProperties": {"foo": bar}} could be short for { "required": ["foo"], "properties": { "foo": bar } }.
This keyword, by contrast, has interactions with other keywords. What happens if I use:
{
"requireAllExcept": ["foo"],
"allOf": [ { "foo": false } ],
"patternProperties": { "f": true },
"additionalProperties": false
}
What's the behavior here? There probably is one—but I have to think about it. And I can't really take an educated guess. I can't even come up with a good example, maybe if I replaced false with true here, or vice-versa, the example would illustrate my point better.
It's true that we have other keywords like this (additionalProperties), but they tend to consume a disproportionate amount of descriptive effort we have to do.
Generally we've moved away from keywords doing multiple things, splitting them up, mainly because it was confusing.
This feels like exactly the kind of keyword that does multiple things you're talking about.
@awwright what about additionalProperties, unevaluated*, minCount, maxCount? These are all keywords that consider other keywords. While we have keyword independence, it's not a hard requirement for keywords.
@gregsdennis Those keywords are specifically listed as exceptions... I mean, yes, we never really went into detail as to why they're exceptions... but as an exercise, write an alternative where the only information available to the keyword is its own value.
The best I can do is something like
{
"contains": {
"needle": "foo",
"min": 1,
"max": 4
}
}
While this nicely emphasizes how "minContains" doesn't do anything by itself—that it's technically an argument to "contains"—this is obviously clunky to write, which is why I think an "argument keyword" is acceptable as an authoring convenience.
In contrast, the single-keyword alternative to "requireAllExcept" combines the functionality of "required" and "properties" into a single keyword, entirely replacing the need for "required"; all the while "properties" can maintain its usual function for listing optional properties that "requireAllExcept" keyword is doing.
This idea that some keywords can be decomposed into more complicated but equal schemas should be familiar, e.g. type: "integer" and "dependentRequired".
※ A keyword "doesn't anything by itself" if the single-keyword schema {keyword: foo} is indistinguishable from {}, for all values of foo. (As for "additionalProperties", this does do something by itself, but perhaps it should have had the semantics that "unevaluatedProperties" has had from the beginning... or maybe something else entirely, I'm still researching this.)
the single-keyword alternative to "requireAllExcept" combines the functionality of "required" and "properties" into a single keyword, entirely replacing the need for "required"
I don't follow this.
The new keyword is an array of property names that are optional. The presence of the keyword implies that all properties in properties are required except the ones in the array.
It's not combining properties and required; it's the inversion of required.
the single-keyword alternative to "requireAllExcept" combines the functionality of "required" and "properties" into a single keyword, entirely replacing the need for "required"
I don't follow this.
I'm not sure what you don't follow, so let me elaborate a little bit.
"requireAllExcept" is a "keyword" but it's not a validation keyword per se, it's an argument keyword: It's modifying the behavior of another (validation) keyword. We try to avoid argument keywords unless there's no better way to do it. So, is there a better way to accomplish the goal here? Yes. We can accomplish the same behavior by reworking the keyword that's doing the heavy lifting.
Take for example:
{
properties: {
"name": {type: "string"},
"comment": {type: "string"}
},
requireAllExcept: [
"comment"
]
}
How do we adapt "requireAllExcept" so that it operates by itself? It can pull in the schema for each of the properties:
{
properties: {
"name": {type: "string"},
},
requireAllExcept: {
"comment": {type: "string"}
}
}
There, now "requireAllExcept" can be used by itself. And we're not redundantly listing the "comment" key name!
But wait, isn't "requireAllExcept" just listing optional properties here—identical to how "properties" works right now? Let's swap the names around.
{
requireAll: {
"name": {type: "string"},
},
properties: {
"comment": {type: "string"}
}
}
Is there possibly a better name for "requireAll" that indicates it accepts a key=>schema map?
{
requiredProperties: {
"name": {type: "string"},
},
properties: {
"comment": {type: "string"}
}
}
...so "requiredProperties" is the same thing as "requireAllExcept", except not broken.
With this approach you also need to update additionalProperties and unevaluatedProperties to look into the new requiredProperties.
@gregsdennis Yes— this is sort of implied when I say "requiredProperties" would 'behave the same way as using "required" and "properties"'—but for consistency you are correct, it would be a good idea to update the definitions of those keywords.
Also, to bring this back to my original point, "requireAllExcept" also needs to update some other paragraphs, specifically the section(s) that discuss keyword independence, so it can be added as one of the exceptions.
Reading the more recent comments, I think we need to re-evaluate our approach here.
I know we're not all happy with the annotations system as defined, however I don't think that means we intend to throw it out. I feel that we can still use it, better defined, and any "keyword interactions" should rely on the premise annotation collection.
I'm VERY strongly opposed to adding another way to list properties which effects additionalProperties and unevaluatedProperties. I feel it will make things all the more confusing. I don't feel the trade-off is worth it.
Adding new constraint base keywords without effecting existing keywords feels preferable to me.
I feel that we can still use it, better defined, and any "keyword interactions" should rely on the premise annotation collection.
Can you detail what this would look like?
I'm VERY strongly opposed to adding another way to list properties which effects additionalProperties and unevaluatedProperties. I feel it will make things all the more confusing. I don't feel the trade-off is worth it.
We're dealing with an authoring convenience, i.e. how to combine common patterns of keywords into a single keyword. There's going to be inherent complexity in that, regardless of the solution.
There's also some amount of subjectivity. We're going to have to balance what new people would expect, with what keeps the cognitive requirements low on very large schemas. (Also, these things might be the same.) I think all authoring conveniences will introduce some "surprise", but we can minimize that; the real purpose of an authoring convenience is it lets you build more complicated schemas on the same amount of brain power (it is probably more difficult to work with an if/then statement than to think "x property means y becomes required" — and so we have "dependentRequired").
We may want to put out a survey that tries to measure these two properties (expected/surprising behavior, and scalable/non-scalable for authors).
... without effecting existing keywords feels preferable to me.
I don't think there's a way to do this.
You can think of "requireAllExcept" as a keyword that first reads the value of "properties", then returns a validation result; or you can think of it as an argument to "properties" (which currently reads no arguments); these two interpretations are logically indistinguishable. However I'm inclined to say we should think of it only an argument to "properties", because the single-keyword schema { "requireAllExcept": anything } has no behavior.
(An alternative name for these "argument keywords" could be "interacts with"... because even though "minContains" is an argument keyword to "contains", it still makes sense to talk about "minContains" as the source of validation errors. It just won't do so when it's the only keyword in the schema.)
Can you detail what this would look like?
Yes, but I'll have to get back to you. I'm overflowing with half done work right now 😭
@awwright we would need to list this in said "Keyword independence" section.
Agreed. This was an oversight. I updated the PR.
@awwright
"minContains" doesn't do anything by itself [] it's technically an argument to "contains"
This is not the way minContains and maxContains are defined. They assert independently. For example, if you have schema { "contains": { "type": "number" }, "maxContains": 1 } and instance [1, 2], then contains will pass and maxContains will fail. Personally, I'd prefer if minContains and maxContains were just arguments to contains, but that's not the way it was defined.
@awwright
"requireAllExcept" is a "keyword" but it's not a validation keyword per se, it's an argument keyword: It's modifying the behavior of another (validation) keyword.
This isn't correct. requireAllExcept does it's own assertion. It does not affect the validation result of properties. You could say that properties is an argument of requireAllExcept, but not the other way around.
requireAllExceptdoes it's own assertion
@jdesrosiers See my full explanation at https://github.com/json-schema-org/json-schema-spec/pull/1144#issuecomment-1111430170; by "per se," I mean "by itself." Since JSON Schema is declarative, there's multiple equivalent ways validation could be performed in actuality, (1) the way I'm thinking about it where you perform the validation at the moment you're iterating through the keywords in "properties" and making sure that each property is either in the instance, or listed in "requireAllExcept"; or (2) your implementation where you validate it at the time the "requireAllExcept" keyword is encountered.
((3) You can also write a validator that performs all the validations at the same time, and you can do so deterministically, proving that "where" the validation occurs truly doesn't matter.)
What makes a keyword an "argument keyword" is that a schema consisting only of e.g. "requireAllExcept" will never be invalid.
Aside: By "at the same time", I mean you can compile a validator for a good number of schemas down to a finite state machine; since common states are factored out, it doesn't make sense to consider a failure to be "from" one keyword or another (the best you can do is say this failure occurs because of the existence of a keyword in the schema, and would have been valid if not for its existence).
I've seen implementations that evaluate all object-based keywords together, all array- together, etc. I'm not a fan of this approach, and I chose to implement each keyword separately in its own function, because it made it easier to selectively enable/disable individual keywords depending on which version of the specification was active -- but what matters is the outcome. As long as the validation result and emitted errors/annotations are correct, do whatever makes sense for your mental model and your choice of language/architecture.
This is, FWIW, why I think we have such divisive arguments about evaluation behaviour sometimes -- the implementation choices and mental models vary quite widely, and this informs our beliefs about how new features ought to work.
@awwright
What makes a keyword an "argument keyword" is that a schema consisting only of e.g.
"requireAllExcept"will never be invalid.
I see what you're trying to say, but I don't think requireAllExcept has that behavior because it's an argument keyword. I think it's an accidental convergence of behavior. By your definition, additionalProperties would not be an argument keyword and requireAllExcept would despite that the keywords work exactly the same way.
I think it would be fair to call something an argument keyword if it has no meaning if not in the presence of another keyword (Examples: minContains, then). This is not the case for requireAllExcept. Using it in a schema by itself still has meaning, just not a very useful meaning. It's equivalent to { "required": [] }. The schema would always validate, but not because it's ignored. There just happens to not be a value that will make the assertion false.
Let me take a step back and acknowledge that I share your concerns about adding a new keyword that breaks keyword independence. In fact, I hate it. I'd rather be finding ways to remove or redefine all such keywords, but weighing the pros and cons, I think requireAllExcept is the least bad option right now. You can look at the same set of tradeoffs and come to a different conclusion just by weighing things differently than I do. That's fine. Here's a list of the things I've taken into account before championing this approach (I'm sure I'm forgetting some, but this should be the most important points).
- The problem of maintaining schemas with a long list of required properties is one of the most common criticisms we get about JSON Schema.
requireAllExceptdoesn't solve the problem of duplicating property names declared inpropertiesin all cases, but it does reduce the burden.requiredPropertiesdoesn't require duplicating property names.requiredPropertieswould create two ways to define properties.requiredPropertiesdoes more than one thing and is both an applicator and an assertion.requiredPropertieswould require changing the definitions of keywords that depend onpropertiessuch asadditionalPropertieswhilerequireAllExceptcan be added without modifying the behavior of any existing keywords.requireAllExceptbreaks keyword independence, but does so in a very familiar way (works just likeadditionalProperties).
The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.
- Make sure your remote for the
json-schema-org/json-schema-specrepo is up-to-date. (Example:git fetch upstream). - Rebase your commits onto
main. (Example:git rebase --onto upstream/main abcd123~1(replaceabcd123with the commit hash of the first commit in your PR)). - Force push the rebased branch to your fork. (Example:
git push --force origin my-branch).
The problem I have with how requireAllExcept is being described comes when trying to fit this into @handrews' behaviors model, which aims to prevent keywords from "looking into" other keywords. properties generates an annotation of the names of the properties it evaluates (which is the the intersection of the keys in its value and the keys in the instance).
(from @awwright's examples above)
{
"properties": {
"name": {"type": "string"},
"comment": {"type": "string"}
},
"requireAllExcept": [
"comment"
]
}
That means that for { "name": "foo", "bar": "baz" }, properties in this schema would generate [ "name" ]. This means that requireAllExcept must look into properties in order to see which properties are defined that aren't in the annotation.
A possible solution to this would be to have properties emit a different form of annotation that includes all of the defined property names plus information on which ones were found. Something like { "name": true, "comment": false } would work, but other forms could do. Then requireAllExcept would just need to look at the annotation of properties, find all of the false values, and verify that none of those are in its own list. (additionalProperties, etc, would only need to look at the keys of this new output from properties.) This, then, fits into the model where annotations are the sole communication lines between keywords.
Also with this approach additionalProperties, etc, would need to be updated for the new properties annotation shape, but I think this is a minimal change.
@gregsdennis You're right, this should probably be defined based on annotations (although I still think that approach needs to be revisited) and the current annotation behavior of properties is insufficient for requireAllExcept to make it's assertion.
From a classification point of view, this keyword should work exactly like additionalProperties except that it's not an applicator.
This PR needs to be rewritten as a proposal document. See https://github.com/json-schema-org/json-schema-spec/pull/1450 for an example.
Closing this.