json-schema-validator
json-schema-validator copied to clipboard
V2020-12 not validating items: { "type" } correctly
I have tests build around this validation library and noticed a type error since I migrated to V2020-12. Given a schema with the following property:
"numberArray": {
"type": "array",
"items": {
"type": "number"
}
}
The following object should pass {"numberArray": [1, 2, 3]} and this one should not: {"numberArray": [1, "wrong type", 3]}. Since the upgrade to the new draft version, both objects are validated as error-free.
The original error only appears if the schema is modified as follows:
"numberArray": {
"type": "array",
"prefixItems": [{}],
"items": {
"type": "number"
}
}
However, the spec features this specific example without any prefixItems:

@open-abbott I ping you, first to say thank you for the implementation and second because maybe you quickly know what to do 😃
And by the way, as items is superseded by prefixItems in this implementation of V2020-12, I also get this warning:
com.networknt.schema.JsonMetaSchema - Unknown keyword items - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
because it is not added to the JsonMetaSchema of the Version202012 class.
Enum is also not validated for arrays
Hey @rustermi ! Currently swamped with other projects, but I'm guessing that the problem lies with the fact that items was semantically redefined in the new spec, so there probably needs to be a new ItemsValidator written that behaves correctly, then apply that to the 202012 version and onward in the ValidatorTypeCode. Cheers!
Had the same issue with items not being validated in an array with V2020-12. The items keyword is configured in ValidatorTypeCode to only be supported up to V2019-09, so that part of the schema doesn't even generate a validator.
Our workaround was to revert to V2019-09 for now, but it would be great to be able to go back to V2020-12 at some point.
In 2020-12 release, the items and additionalItems keywords have been replaced with prefixItems and items. They are trying to reuse the keywords with different meanings. We are open to all suggestions on the design to resolve the issue.
https://json-schema.org/draft/2020-12/release-notes.html
Hi 👋 -- I just came across this exact problem. I think the fundamental problem lies with the v2020-12 spec.
We are open to all suggestions on the design to resolve the issue.
Cool, I'm game. Have some thoughts.
In the 202-12 release notes, the following is said:
The keywords used for defining arrays and tuples have been redesigned to help lower the learning curve for JSON Schema
That's fine but the solution should be with the type itself. Instead of overloading the array type to serve as both a generic array and as a tuple, they should be distinct types. So tuple would ideally be a top-level type just like array. With tuple the concept of prefixItems makes far more sense. You can then do the following:
"foo": {
"type": "array",
"items": { type: "string" }
}
// ["apple", "banana", "pear"] --> valid
// ["apple", 100, "pear"] --> invalid
"bar": {
"type": "tuple",
"prefixItems": [
{ type: "string" },
{ type: "number" },
{ type: "string" }
],
items: false
}
// ["apple", 100, "pear"] --> valid
// ["apple", 100, "pear", "grape"] --> invalid
// ["apple", "banana", "pear"] --> invalid
prefixItems is only used with the tuple type. Also, with a distinct tuple type, you could then go back to using additionalItems since contextually it would make more sense than "items". You then have this...
"bar": {
"type": "tuple",
"prefixItems": [
{ type: "string" },
{ type: "number" },
{ type: "string" }
],
additionalItems: false
}
// ["apple", 100, "pear", "grape"] --> invalid
"bar": {
"type": "tuple",
"prefixItems": [
{ type: "string" },
{ type: "number" },
{ type: "string" }
],
additionalItems: {
type: "string"
}
}
// ["apple", 100, "pear", "grape"] --> valid
no update on this issue ?
@mlcohen Your idea makes sense, and I am wondering if you could issue a PR to get it implemented.
@stevehu Since my suggestion focuses on changing the type and would be considered a breaking schema change, seems like it'd be worth raising as a discussion here https://github.com/json-schema-org/community/discussions.
@stevehu what would be the problem with reusing keywords with different meanings? couldn't this be fixed in the lib? wouldn't it be easier than changing the JSON Schema specification?
When we started this library, we made an assumption that all the keywords should be forward and backward-compatible between different versions. So, we only need to add new validator implementations for new keywords when a new version of JSON schema is released. So far, this works perfectly until we have release 2020-12 release. Some of the keywords in this release have changed meanings and exchanged with other keywords. This forces us to adopt two options:
- make the current library support versions before 2020-12 and create a new project for 2020-12 and beyond. This will give users a lot of trouble in upgrading and getting support.
- change the validator for some of the keywords with if/else or switch to have different implementations for different versions. This will significantly increase the maintenance effort and make it harder for developers to understand.
At this moment, we don't know which way to go as it is not clear road map of schema specification team.
At this moment, we don't know which way to go as it is not clear road map of schema specification team.
@stevehu Thanks for the additional details. Good to know.
WRT introducing a new type like tuple and shifting prefixItems and additionalItems to be exclusive to that new type, it would certainly break backwards compatibility. Without something like tuple, it just means continuing to deal with a subpar schema to express an array. I mean, if the official JSON Schema docs still to show the items field without any use of prefixItems, that may suggest that it is the better way to express the items of an array. In any case, this isn't library implementation issue as it is a specification issue.
Going back over the JSON Schema 2020-12 release notes, the following is stated:
Although the meaning of items has changed, the syntax for defining arrays remains the same. Only the syntax for defining tuples has changed. The idea is that an array has items (items) and optionally has some positionally defined items that come before the normal items (prefixItems).
Key phrases being "the syntax for defining arrays remains the same" and "Only the syntax for defining tuples has changed". Since defining arrays remains the same, the following should be valid:
{
"type": "array",
"items": {
"type": "number"
}
}
Just like in the JSON schema learning documentation. Only when prefixItems is included does the meaning of the array type change where it becomes a tuple at which point the items fields changes its meaning to be additional fields. However, for backwards compatibility, a tuple would still need to be defined with items (and optionally an additionalItems field) where in this case items is an array of schemas. So the following cases representing ways to define a tuple should be valid:
// Before 2020-12 Schema changes
{
"type": "array",
"items": [
{ "type": "number" },
{ "type": "string" },
{ "enum": ["Street", "Avenue", "Boulevard"] },
{ "enum": ["NW", "NE", "SW", "SE"] }
]
"additionalItems": false
}
// 2020-12 Schema
{
"type": "array",
"prefixItems": [
{ "type": "number" },
{ "type": "string" },
{ "enum": ["Street", "Avenue", "Boulevard"] },
{ "enum": ["NW", "NE", "SW", "SE"] }
]
"items": false
}
For the library, if using the 2020-12 schema, it seems to be a matter of checking whether items is defined as an object, array or a boolean along with the existence of prefixItems field.
@mlcohen Thanks a lot for digging into the document for all the details. If the above rules cover all the scenarios, we can update the ItemsValidator and PrefixItemValidator to support them. Are you interested in working on it?
@stevehu Yep. I've started to dig into the code you referenced. Getting myself oriented with how it works.
The README should be updated to state that the 2020-12 specification is currently not supported, given that items is not validated properly.
Updated the README.md to indicate it is only partially supported. Thanks.
I'm really glad that README is corrected now, but I would prefer to see a fix :)
We definitely need 2020-12 in our project. I don't want to spend time on workarounds if a proper solution is expected any soon. From the above discussion I got an impression, @mlcohen has his hands on it. Any progress? Need help?
@bmaizel Apologies for my slowness.
I was digging into the code. First part was just orienting myself to how the PrefixItemvalidator works within the broader context of JsonSchema and JsonMetaSchema hierarchy*. Beyond that, I wasn't sure if the prior way of representing a tuple in 2019-09 should still be supported in 2020-12. Turns out, no, 2020-12 is not intended to be backwards compatible with how tuples were represented in 2019-09. That's good since it simplifies the validation logic. (For details see https://github.com/orgs/json-schema-org/discussions/335 ... also got insight into why JSON Schema doesn't have an explicit tuple type).
(* Should note that there is also the ItemsValidator which currently has logic to check for the additionalItems field.)
Need help?
@bmaizel Sure thing. I'm a newb to the guts of json-schema-validator library, so if you have guidance, I'm all ears.
After familiarizing myself with the code, a couple of observations...
To get items field to work with the 2020-12 spec, you can update ValidatorTypeCode.ITEMS version code to be VersionCode.AllVersions. (The type code is currently assigned VersionCode.MaxV201909). That will cause the schema validator to correctly validate the following:
// schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"numList": {
"type": "array",
"items": {
"type": "number"
}
}
}
}
// Passes validation
{
"numList": [10, 20, 30]
}
// Fails validation
{
"numList": [10, "foo", 30]
}
However, by enabling the items validator for 2020-12, it causes a conflict with the prefix items validator (PrefixItemsValidator). In the prefix items validator, it will check for additional items by looking for the items field. If detected, it will then create a child JSON schema object that will make use of the items validator. The following fails:
// schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"myTuple": {
"type": "array",
"prefixItems": [
{
"type": "number"
}
],
"items": {
"type": "string"
}
}
}
}
// Fails validation --> $.arr1[0]: integer found, string expected
{
"myTuple": [10, "foo", "bar"]
}
Above, when validation is run, the items validator gets applied before the prefix items validator.
Something else: In the 2020-12 spec, items can be an object or a boolean (in association with prefixItems) but not an array with definitions which is allowed in the 2019-09 spec. Knowing that, this spec will still work:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"tuple": {
"type": "array",
"items": [
{
"type": "number"
},
{
"type": "number"
},
{
"type": "string"
}
]
}
}
}
// Passes validation
{
"tuple": [10, 20, "foo"]
}
// Fails validation
{
"tuple": [10, "foo", 30]
}
The ItemsValidator has logic to accommodate a items being an array, an object and a boolean. There isn't anything within the validator making a distinction what logic can be executed for the schema being validated against... More specifically the JSON meta schema. Therefor applying ItemsValidator to 2020-12, means you can do something funky like this:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"arr1": {
"type": "array",
"prefixItems": [
{
"type": "number"
}
],
"items": [
{
"type": "number"
},
{
"type": "number"
},
{
"type": "string"
}
]
}
}
}
Definitely not a valid 2020-12 schema.
Since there is a clear break between how items is intended to be used in schema version 2019-09 and 2020-12, I look at the items validator in one of three ways:
- There is still one
ItemsValidatorbut it's logic is updated to take into account the what the JSON meta schema version is - There are two item validators --
ItemsValidatorMaxV201909andItemsValidatorMinV202012 - There is still one
ItemsValidatorbut instead of taking into account the JSON meta schema version, it instead checks for aprefixItemsfield. If detected, it modifies its behavior to only check elements in the list that come after items in the prefix items list.
For the first two options, I don't see any established pattern in the library where either a validator takes into account the JSON meta schema version or having two validators for the same field. So I'm reluctant to introduce some new approach. The third option is more inline with checking surrounding fields; however, it still leaves space to misuse the items field that would fail to make it compliant with 2019-09 or 2020-12.
In any case, without the ability to establish some kind of clean break between the two schema versions, the items validator would remain loose in how it validates which, at least to me, probably shouldn't be the case.
Would love feedback on this :)
@mlcohen Thanks a lot for the detailed analysis. I am leaning toward your option 3, and this was my original thought. What we need to do is to make the ItemsValidator version aware. It is like an IF or SWITCH in the validator logic to behave differently based on before 201909 and after 201909. The only problem I have is that when users have a 202012 schema, that refers to a 201909 schema. We cannot support it in this case, as the version context will only be available for the parent schema. Let me know if you and other want to set up a zoom meeting to brainstorm together.
What we need to do is to make the ItemsValidator version aware
Yep. Looked at the validators and see that they can access the current meta schema version using validationContext.getMetaSchema().getUri().
The only problem I have is that when users have a 202012 schema, that refers to a 201909 schema
Interesting. So you mean something like the following:
// 2019-09 schema
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "https://example.com/schemas/user",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
}
}
// 2020-12 schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"foo": {
"type": "array",
"items": {
"type": "string"
}
},
"user": { "$ref": "https://example.com/schemas/user" }
}
}
Let me know if you and other want to set up a zoom meeting to brainstorm together.
Sure thing. Happy to brainstorm.
Yes. That is exactly what I am referring to. When subschemas are used, we need to switch the version context when navigating to the subschema. We can ignore this use case for now; however, we need to support it if more and more validators behave differently based on the version context. My time is very flexible and my timezone is Eastern Standard Time. Anyone who is interested, please let me know when the best time is so that I can send out a zoom link.
I'm Pacific Time. I can be available either at 10:30 AM or 2 PM on Monday.
Does 1:30 pm to 2:00 pm EST work for you? Anybody interested in this topic is welcome to join. Thanks.
https://us04web.zoom.us/j/73904341562
Sorry. This meeting room. https://us04web.zoom.us/j/73523145537?pwd=AlaL9YUuau3RPtMHCbGMxZgNQCd1pc.1
@stevehu Apologies. I mixed up the TZ on my cal. I can be available 11:30 AM PT (2:30 PM ET) or 1 PM PT (4 PM ET).
Sorry. I had meetings since 2 pm and missed your last message. My email is [email protected]. Please send me an email and let me know several time slots. I will try to accommodate. Thanks.
I'm hitting this same problem while trying to validate the Oasis STIX schema. This is the schema details the objects are in a "Bundle":
...
"objects": {
"type": "array",
"description": "Specifies a set of one or more STIX Objects.",
"items": {
"anyOf": [
{
"oneOf": [
{
"$ref": "../sdos/attack-pattern.json"
},
{
"$ref": "../sdos/campaign.json"
},
{
"$ref": "../sdos/course-of-action.json"
},
...
With this the validation passes. If I switch to 2019-09, it fails (correctly), but with a huge amount of messages.
I'm not an expert with JSON Schemas yet, but am on the path. What other way could the above be formatted in 2012-12 using the current version of this lib?
Adding an idea here on this problem:
The only problem I have is that when users have a 202012 schema, that refers to a 201909 schema. We cannot support it in this case, as the version context will only be available for the parent schema. Let me know if you and other want to set up a zoom meeting to brainstorm together.
I see in the code that a ValidatorState instance is recorded in a ThreadLocal that is accessible to all validators. When a referenced schema is loaded, could it be the case that the version is recorded in a Stack in the ValidatorState? When the ItemsValidator needs to check the schema version it could peek at the latest entry in this stack to see the one that applies to it. Assuming this is possible, one would of course need to also pop the latest schema version once the referenced schema is fully processed.