json-schema-spec icon indicating copy to clipboard operation
json-schema-spec copied to clipboard

Proposal: Add propertyDiscriminator keyword (aka discriminator)

Open xiaoxiangmoe opened this issue 2 years ago • 10 comments

The pattern of using oneOf to describe a choice between two types has become ubiquitous.

{
  "oneOf": [
    { "$ref": "#/$defs/aaa" },
    { "$ref": "#/$defs/bbb" }
  ]
}

However, this pattern is also notorious for its many shortcomings. There is a better way to describe this kind of constraint that doesn't have all the problems of the oneOf pattern, but it's verbose, error prone, and not particularly intuitive.

{
  "allOf": [
    {
      "if": {
        "properties": {
          "foo": { "const": "aaa" }
        },
        "required": ["foo"]
      },
      "then": { "$ref": "#/$defs/foo-aaa" }
    },
    {
      "if": {
        "properties": {
          "foo": { "const": "bbb" }
        },
        "required": ["foo"]
      },
      "then": { "$ref": "#/$defs/foo-bbb" }
    }
  ]
}

But this is too complicated. We may simplify this.

{
  "propertyDiscriminator": {
    "foo": [
      [
        "aaa",
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        "bbb",
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

releated issue : https://github.com/json-schema-org/json-schema-spec/issues/1082#issuecomment-1523335580

The difference between #1082 and this proposal is —— that issue only support string as discriminator. This proposal is a little more complicated. But it would accept any json value here.

For example:

{
  "propertyDiscriminator": {
    "foo": [
      [
        2,
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        3,
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

cc @jdesrosiers @gregsdennis

xiaoxiangmoe avatar Jun 08 '23 09:06 xiaoxiangmoe

The only hesitation I have for this is the use of a tuple construct as part of the definition. In order to understand the keyword, you have to "just know" that the first item in the tuple is a key to match on and the second is the schema (rather than the components having "names" that indicate what they are).

Secondly what are the performance implications if someone used a (very) complex object instead of a primitive for the key?

{
  "propertyDiscriminator": {
    "foo": [
      [
        { "prop1": "value", "prop2": 42, "array": [ 1, 2, "string" ] },
        {
          "$ref": "#/$defs/foo-aaa"
        }
      ],
      [
        3,
        {
          "$ref": "#/$defs/foo-bbb"
        }
      ]
    ]
  }
}

gregsdennis avatar Jun 08 '23 10:06 gregsdennis

I don't know if the name change was intentional, but this should be considered an alternative to propertyDependencies. We would choose one or the other, definitely not both, so it doesn't need a unique name. That said, I'm open to considering a different name.

Here's my relevant comments about this from the other issue where it was originally proposed.

One of my main design goals for this keyword has always been to keep it as simple as possible or people won't use it. This is a little more complex, but I don't think it's too much.

However, this isn't actually an O(1) process. The lookup from the Map is O(1), but the cost of building the Map is O(n). Implementations that have a compile step could produce the Map in the compile step and then evaluate instances using that Map in O(1), but most implementations don't have a compile step.

Overall, I think you've presented a reasonable alternative for this keyword. It makes it more versatile without adding too much complexity. It's technically a performance regression, but it can be optimized to O(1) in a lot of cases. In any case, we're talking about very small values for "n", so O(n) isn't much worse that O(1).

The biggest question to me is whether the added complexity is worth the extra capability. I believe that if it gets too verbose and complex, people won't use it. I just don't know where the line is. Maybe it will be fine and maybe not. I think it will be important to get a lot of opinions on this.

jdesrosiers avatar Jun 08 '23 17:06 jdesrosiers

In any case, we're talking about very small values for "n", so O(n) isn't much worse that O(1).

I recently had a user of my schema generation lib ask me to support the { allOf: [ { if/then }... ] } pattern based on an enum property in their C# code. The enum they were using must have had ~50 different values, which is why they wanted generator support.

I think you're probably right for many (most?) cases, but cases like this still need to be considered.

gregsdennis avatar Jun 08 '23 20:06 gregsdennis

Yes, there will always be outliers. In https://github.com/json-schema-org/json-schema-spec/issues/1410 someone was talking about having over 100 conditions they want to select on. I'm not worried about the performance difference mostly because it's possible to optimize to O(1). It will be necessary to build to lookup, but you only have to do that once and can validate as many instances as you like after than at O(1). Also, these comparisons should be fairly fast, so I don't expect that even 100 comparisons will present a significant performance concern.

jdesrosiers avatar Jun 08 '23 23:06 jdesrosiers

I don't know if the name change was intentional

Both propertyDependencies and propertyDiscriminator is okay for me. I see PR #1143 is merged, so I choose a different name.

xiaoxiangmoe avatar Jun 09 '23 04:06 xiaoxiangmoe

@gregsdennis

In order to understand the keyword, you have to "just know" that the first item in the tuple is a key to match on and the second is the schema (rather than the components having "names" that indicate what they are).

I think the pattern below is also okay for me.

{
  "propertyDiscriminator": {
    "foo": [
      {
        "propertyValue": 114,
        "subSchema": {
          "$ref": "#/$defs/foo-aaa"
        }
      },
      {
        "propertyValue": 514,
        "subSchema": {
          "$ref": "#/$defs/foo-bbb"
        }
      }
    ]
  }
}

But this will using a little more words than using 2-tuple [propertyValue,subSchema].

Secondly what are the performance implications if someone used a (very) complex object instead of a primitive for the key?

Maybe we should only support int, string, null and boolean in first version of this standard. And waiting for more use case feedback later.


@jdesrosiers My real concern is that our new proposal needs to support both string discriminator and int discriminator.

There are a lot of practices on the Internet that use int enum as a discriminator. Especially in some systems written in C language, enum is an integer, and what is passed after serialization is also an integer.

If int enum discriminator cannot be supported, then the scope of application of propertyDependencies may be narrowed, causing everyone to fall back to using the more complicated if then else.

  • The coding style guide of some teams may educate everyone that string enum property can use propertyDependencies, and int enum property cannot use propertyDependencies.
  • The coding style guide of some teams may prohibit propertyDependencies for uniformity.

So I think the added complexity is worth for wider applicability.

xiaoxiangmoe avatar Jun 09 '23 05:06 xiaoxiangmoe

But this will using a little more words than using 2-tuple [propertyValue,subSchema].

That's my concern. IMO using an object rather than an array crosses the line of being too verbose.

Maybe we should only support int, string, null and boolean in first version of this standard.

You forgot "number", but otherwise I agree that limiting to scalar values would be reasonable. However, I'm not sure it's necessary. I don't see any performance implications to allowing it other than it's more expensive to compare structured values than it is compare scalar ones, but that's inherent in the choice to compare on structured values. I don't see it as a concern for this keyword specifically.

jdesrosiers avatar Jun 09 '23 17:06 jdesrosiers

has there been any significant progress? I really need this, if-then-else is so bad. @xiaoxiangmoe @jdesrosiers

ErosZy avatar Jun 13 '23 11:06 ErosZy

@ErosZy this is a specification change proposal. Spec work takes time. There are a lot of other spec lifecycle management things that we (the JSON Schema team) need to work out before we can get back to making these kinds of changes.

My suggestion is to find out if the implementation you're using supports custom keywords, either via vocabularies or some other method. If so, then I suggest writing the functionality you need in order to meet your immediate needs. (We also welcome your feedback on anything that you come up with.)

As far as updating the spec goes, please have patience. We are considering this for the next version. Either this or propertyDependencies or possibly even something else (maybe what you come up with) is expected to be included.

gregsdennis avatar Jun 13 '23 20:06 gregsdennis

Note: propertyDependencies is being extracted to a separate proposal document. Once that goes through, PRs against the keyword should be made against that document, not Core.

gregsdennis avatar Oct 04 '23 21:10 gregsdennis

Closing the issue as the proposal document has been written. Further requested changes should open new issues.

gregsdennis avatar Jul 16 '24 00:07 gregsdennis