OpenAPI-Specification Deprecate discriminator?

I just created an example to illustrate what I think is idiomatic use of discrimator, so I could help answer #2141. I find it helpful to use JSON Schema Lint so I can validate in real-time. To make sure the discriminating logic worked correctly in a standard JSON Schema validator (not aware of OAS discriminator), I used standard JSON Schema keywords to duplicate that logic.

This begs the question: now that 3.0 supports a large-enough subset of JSON Schema to describe discriminated subtypes, and 3.1 is planned to support full JSON Schema, do we still need discriminator?

@handrews mentions the same idea here in #2031, so I think the idea deserves its own issue for future version planning.

I can see that discriminator might have some value for code generators. It might be easier for a code generator to be told explicitly to discriminate based on a given property, rather than relying on a JSON Schema validator to identify the matched subtype, or recognizing a pattern involving oneOf, enum (or const), etc.

But discriminator, as it's defined, is kind of problematic. @handrews pointed out that it ignores some available JSON Schema semantics. And I've observed, generally, that the documentation is difficult (for me) to follow, and seems to leave some questions unanswered. Without trying to create a comprehensive list here:

I have doubts about the combination of mapping and default name matching shown in the last example of this section.
It seems that mapping is supposed to be supplemental to name matching, rather than replacing it. In that case, is there a way for the discriminator to ensure that the discriminator property value is one of a known list of subtypes? Do we always need a separate oneOf to validate this?
Does name matching (without mapping) assume it's going to find the named subtype schema in #/components/schemas, or is there some other expectation?

Maybe this is more weight than we need to carry, and we'd be better off leaving this problem to JSON Schema and some new vocabulary for code generation, O-O type system mapping, or something like that.

Feb 16 '20 15:02 tedepstein

FYI-- the JSON Schema Lint link is incorrect or isn't working https://github.com/OAI/OpenAPI-Specification/issues/jsonschemalint.com

Feb 16 '20 17:02 mewalig

Thanks, @mewalig. Fixed.

Feb 16 '20 17:02 tedepstein

Are there any simple examples of how "discriminator" in OpenAPI would compare to the analogous schema as described using JSON Schema?

Also, do any validators exist today that will properly enforce either the OpenAPI discriminator or the JSON Schema discriminator equivalent? I understand that this repo is not intended for tools, but in evaluating whether, out of two standards, one should be ditched, it would be nice to be able to play around a bit more with each in order to form a better opinion

Feb 16 '20 18:02 mewalig

Are there any simple examples of how "discriminator" in OpenAPI would compare to the analogous schema as described using JSON Schema?

If you take my example schema from #2141 and remove the discriminator, that's how you'd do it in JSON Schema.

Also, do any validators exist today that will properly enforce either the OpenAPI discriminator or the JSON Schema discriminator equivalent?

Any compliant JSON Schema validator should be able to enforce a JSON Schema like the one I posted.

I don't know about tools that enforce OpenAPI discriminator, but it sounds like you have found some already. You might want to try them again with the schema I provided (modified back to your original properties property name, otherwise you'd need to adjust your test examples).

Feb 16 '20 18:02 tedepstein

Yes, absolutely, please get rid of discriminator. My reasons explained in more detail at https://github.com/OAI/OpenAPI-Specification/issues/2141

Feb 16 '20 18:02 mewalig

@tedepstein thanks for filing this!

@mewalig a possible way to solve this problem based on adding some extension keywords to the most recent draft of JSON Schema might look like the following.

First, let's just look at how to make inheritance work better at all:

{
    "oneOf": [
        {   
            "$ref": "#/$defs/child1"
        },  
        {   
            "$ref": "#/$defs/child2"
        },  
        {   
            "$ref": "#/$defs/child3"
        }
    ],  
    "unevaluatedProperties": false,
    "$defs": {
        "base": {
            "className": "TheBaseClass",
            "type": "object",
            ... 
        },  
        "child1": {
            "$ref": "#/$defs/base",
            "classRelation": "is-a",
            "className": "Foo",
            ... 
        },  
        "child2": {
            "$ref": "#/$defs/base",
            "classRelation": "is-a",
            "className": "Bar",
            ...
        },  
        "child3": {
            "$ref": "#/$defs/base",
            "classRelation": "is-a",
            "className": "SomethingElse",
            ... 
        }   
    }   
}

Here we've introduced two new keywords, className and classRelation. These are annotations rather than assertions, meaning they have no impact on validation.

className tells code generators that each of the definitions is intended to represent an object-oriented class, and gives that class an explicit name (rather than relying on the defs names, e.g. "child1", "child2", etc., which are implementation details of the schema, not interfaces).

classRelation is a semantic clarifier of the adjacent $ref, indicating that the $ref target is to be treated as the base class for the current class by code generators. Because $ref doesn't always mean inheritance, so when we want it to do that, we should be explicit about it.

Code Generation

When a code generator looks at this, it is going to statically analyze the schema without an instance. It will ignore the oneOf, but notice that there are four classes here, and that three of them designate the fourth to be their base class. That is enough information to generate the correct classes. The oneOf is superfluous for code generation, which is good because you might just have a schema document consisting of the $defs part, and only use them with a oneOf somewhere else. I think this part is pretty straightforward, let me know if it is not clear.

Validation

When you validate an instance against this schema, the oneOf becomes important. Only one of those oneOf branches (at most) will validate. So if your validator supports annotations in validation output, and it matches the 2nd child entry (Bar), you'll get the following information, more or less:

a className annotation of "Bar" from /oneOf/1/$ref/className
a classRelation annotation of "is-a" from /oneOf/1/$ref/classRelation
a className annotation of "TheBaseClass" from /oneOf/1/$ref/$ref/className

Note that the "from" pointers are showing the dynamic scope, the path traversed at runtime including references. Note also that the names "base", "child1", "child2", etc. from under $defs do not appear, because we reach those schemas through a $ref dynamically, not by static traversal of the schema.

Instantiation

If we're trying to instantiate a class from a JSON instance, first we validate it and get the above annotation outputs. We notice that there are two className values attached. But one of those is accompanied by a classRelation annotation in the same schema object.

So we immediately know that this JSON instance data is a valid "Bar", and that a "Bar" "is-a" ... That's probably enough information to instantiate the thing right there, as the code generation should have already taken inheritance into account, and you know the data is valid.

If we want to make sure the other className is not what we really should be using, we look a little deeper. The dynamic schema path prefix of these "Bar" and "is-a" annotations is /oneOf/1/$ref. classRelation works next to a $ref, so we look for /oneOf/1/$ref/$ref, and sure enough we find that it is also a named class, "TheBaseClass." So now we know that "Bar" "is-a" "TheBaseClass".

We notice that there aren't any other className annotations in these results, and we obviously want to instantiate the derived class (that's what you use oneOf for! otherwise you'd just $ref the base class). So we're done, and we can be certain that we want to instantiate "Bar" and not "TheBaseClass."

Optimization

Technically, this is all we need. Whether you look at this schema statically (as a code generator would) or dynamically (as an instantiator would), you can figure out everything you need from your task. And none of it gets in the way of validation (unknown keywords are ignored by the validator).

One of the reasons OAS has discriminator is that it can be confusing to figure out how data maps to subclasses. And potentially expensive to do that validation. We could come up with another keyword that would optimize that, although honestly I'd want to understand the use case a bit more. An advantage of just going with className and classRelation (or something like them- I pretty much made these up on the spot and have not thought them through) is that it works with arbitrarily complex subclass differences. You aren't forced to have a type enumeration field.

But you could definitely have one. I started trying to write it out, but it gets pretty annoying because either you end up with magical behavior (the way the value of the field suddenly has to match some other thing in the schema structure, and has to validate, which means the same value has to appear in two different places and be kept in sync), or it provides minimal performance benefits because you still have to do some validation.

I'd want to understand what needs to be optimized in this case before throwing in more keywords.

Feb 17 '20 22:02 handrews

Thanks for the thoughtful response.

I'd want to understand what needs to be optimized in this case before throwing in more keywords.

Just to clarify, my suggestion would not be to add any keywords, but rather, to remove them-- in particular, to remove "discriminator" and everything associated with it.

Using the schema provided by @tedepstein, I was able to do everything I wanted in OpenAPI/JSON Schema-- without the discriminator keyword-- and it worked better without discriminator, because it was clearer (in my opinion), and none of the validation tools I could find work with discriminator, but at least one (the first one I tried) worked using the approach without discriminator (I understand this forum is not meant to cover tool issues, but from an end user standpoint, since I would rather use than build tools for OpenAPI, discriminator is of little value to me if I can't validate it).

That said, I'm not familiar enough with the issues to know that there isn't some other use case that discriminator solves, which can't be solved without it. However, if, or to the extent that, such case does not exist, my preference would be to eliminate discriminator.

Feb 17 '20 23:02 liquidaty

@liquidaty

Just to clarify, my suggestion would not be to add any keywords, but rather, to remove them-- in particular, to remove discriminator and everything associated with it.

Yes, my example would remove discriminator and the Discriminator Object entirely. Or at least works without it- removal is a function of the deprecation policy, so I doubt it would be removed before 4.0.

Using the schema provided by @tedepstein, I was able to do everything I wanted in OpenAPI/JSON Schema-- without the discriminator keyword

Just to be clear, we're talking about this example schema? https://github.com/OAI/OpenAPI-Specification/issues/2141#issuecomment-586714672

Doesn't it use discriminator? That looks like one in the "ObjectBaseType" schema.

Feb 18 '20 03:02 handrews

@handrews , that is a switch-hitting schema that includes discriminator, but also includes everything a standard JSON Schema parser would need in order to validate a message:

A wrapper schema (called Object in this example) that uses oneOf to direct the validator to one of the concrete subtypes. Doesn't rely on special discriminator logic to determine the subtype and perform an additional validation against that subtype.
A single-valued enum assertion on the designated discriminator property in the concrete subtype schemas, to explicitly identify the subtype so oneOf works reliably. (This would be a const assertion, but that's not supported in OAS 3.0, thus enum.)

@mewalig couldn't get any available OpenAPI validator to work with discriminator, but the pattern above worked fine with a standard JSON Schema validator. That's what reminded me of your comment in #2031 about getting rid of discriminator, and prompted me to open this issue.

If you remove discriminator from that example schema, the only thing missing is a straightforward way for a code generator to:

know about the class hierarchy; and
optimize validation based on the designated discriminator property, or at least coordinate with the JSON Schema validator to know which subtype was matched by the oneOf.

At first glance, the pattern you've described seems to address both of these.

Feb 18 '20 13:02 tedepstein

fwiw, personally, if a code generator works fine with a discriminator-free schema, I can't think of any value in having a redundant-and-more-verbose alternative schema just to generate some alternative but functionally equivalent code, and if there was more value in having the alternative code, I would find it better for the code generator to figure it out based on the schema pattern. That would eliminate the possibility of inconsistency between the information related to ```discriminator`` and the information in the rest of the schema.

Feb 18 '20 20:02 mewalig

@mewalig , I don't think anyone here is making an argument to keep discriminator indefinitely. We all want to deprecate it and see it replaced by a standard JSON Schema pattern. But this isn't something that happens overnight, because there is a large and diverse community of OpenAPI users and tool providers who need to weigh in.

Also, the earliest release in which we could deprecate discriminator would be 3.1, and that assumes the community agrees that it's OK to deprecate features in a minor release. I don't know if we have really discussed that. Anyway, just saying, don't hold your breath waiting for a final confirmation that we're removing discriminator. :-)

Feb 18 '20 21:02 tedepstein

@tedepstein

Doesn't rely on special discriminator logic to determine the subtype and perform an additional validation against that subtype.

It says something (I'm not sure what, but something) that I totally forgot about this behavior, which is in fact my biggest objection to the keyword! 🤣

Feb 19 '20 00:02 handrews

@tedepstein We merged a deprecation of nullable, not sure if that counts as a deprecation of a "feature", particularly given that there is a simple alternative that can be easily migrated to.

discriminator would be more tricky, and I would not expect keywords such as those I used above to be added for OAS 3.1. Which is why I speculated about deprecating it in OAS 3.2 if such a thing ends up going out (or OAS 4 if we drop minor releases or just don't do another one in the 3.x sequence).

Feb 20 '20 00:02 handrews

At the very least, I think it would be immensely helpful for the documentation (e.g. the spec and related example) to:

make it clear that discriminator is entirely optional, and
provide the analogous example that achieves the same result without the use of discriminator

It might also be worth mentioning that #2 relies on fewer specification features, which are common to JSON Schema, and as such are better supported by validation (and other?) tools.

Those changes alone would have saved me more than 10 hours of time going in circles with discriminator only to end up with a better solution that doesn't use it (not to mention the time so generously contributed by other folks on this issue and #2141), and judging by other online forums where similar questions are asked but not comprehensively answered, I would think others would similarly benefit. Adding that to the docs would also help to pave the way for a smoother transition if/when deprecation happens.

Feb 20 '20 03:02 mewalig

I still see discriminator as a valid way of representing a tagged union, where a single field is used to determine the actual value type, as opposed to the more standard way of validating combined schemas without it. JSON Schema supports far more complex conditional schema validation, which technically subsumes what discriminator tries to do in terms of features, but as far as API typing goes, I think something as flexible as conditional typing has the risk of making contracts harder to verify.

Unions as currently supported by OpenAPI (through oneOf, anyOf and allOf) and tagged unions (what discriminator does) are probably as far as an API specification can go typing-wise without leaking implementation details (classes being an OOP concept). As far as I can tell, conditional validation makes it harder to represent all expressable schemas as valid data types.

Feb 26 '20 21:02 sm-Fifteen

@sm-Fifteen , thanks for articulating this:

I still see discriminator as a valid way of representing a tagged union, where a single field is used to determine the actual value type, as opposed to the more standard way of validating combined schemas without it.

That's the benefit of discriminator: It provides a direct, unambiguous way of expressing the intent to model a tagged union, and gives processors a straightforward way to interpret the tags.

Without these clear markers, pattern recognition is much more complex.

Processors have to deal with different variants of the pattern, e.g. using const, enum or pattern to assert the discriminator value.
They have to decide how to handle cases where some branch of the schema diverges from the pattern, e.g. by introducing additional assertions in a oneOf subschema
They may have to improvise their own custom annotations to disambiguate or to provide additional metadata, like name mapping.

The case for deprecating discriminator will be stronger as and when there's an equivalent JSON Schema vocabulary for this. If that doesn't happen on its own, maybe OpenAPI can contribute a vocabulary, alone or in collaboration with other interested parties.

Feb 29 '20 13:02 tedepstein

Since a functional JSON equivalent to discriminator already exists, is there any reason to keep discriminator, aside from legacy continuity (which admittedly is important), other than convenience (i.e. easier read, write or generate code from the schema)?

If that's the goal, then another possibility might be to have a "type" value of class, which would essentially be a specialized form of object (or alternatively, having a reserved keyword/value pair for the object type e.g. "subtype": "class"). This class object could then just follow a traditional IDL class syntax. Using this approach, the example might look like:

{
    "oneOf": [
        {   
            "$ref": "#/$defs/base"
        }
    ],
    "$defs": {
        "base": {
            "type": "class",
            "name": "TheBaseClass",
            "properties": {
              "base_prop": ...
            }
        },  
        "child1": {
            "type": "class",
            "name": "Foo",
            "extends": "base",
            "properties": {
              "foo_prop": ...
            }
            ... 
        },  
        "child2": {
            "type": "class",
            "name": "Bar",
            "extends": "base",
            "properties": {
              "bar_prop": ...
            }
            ...
        },  
        "child3": {
            "type": "class",
            "name": "SomethingElse",
            "extends": "base",
            "properties": {
              "other_prop": ...
            }
            ... 
        }   
    }   
}

and the corresponding JSON like this:

{
  "base_prop": ...,
  "other_prop": ... // class ```child3``` is implied
}

or

{ // explicit class
  "className": "Bar", // alternatively: "classId": "child2"
  "base_prop": ...,
  "other_prop": ... // invalid: not a property of Bar class
}

It seems to me that this would be easy to convert to equivalent JSON Schema, which would allow for easy validation of the OpenAPI schema, because the validator could just convert to JSON Schema and then validate that. The same would be true for code generators. In addition, there would be a lot of benefits if it allows OpenAPI to reuse familiar, mature and proven IDL constructs and/or syntax (lower barriers to adoption, higher functional value, less need for revision / change, etc)

Mar 02 '20 19:03 mewalig

is there any reason to keep discriminator, aside from legacy continuity (which admittedly is important), other than convenience (i.e. easier read, write or generate code from the schema)?

Clarity of intent. Standardized, unambiguous way to denote tagged unions, rather than relying on a loosely defined convention. "Convenience" translates into availability of code generators, validators and other tools: more of them, better consistency, and higher quality. All good things for the OpenAPI ecosystem and user community. These are the reasons described in recent posts here.

I'm not arguing that it's ideal to keep discriminator in its current form, in OpenAPI, forever. But I would like to see it replaced by a JSON Schema vocabulary that has the same level (at least) of clarity, consistency, and simplicity.

Mar 02 '20 19:03 tedepstein

Clarity of intent. Standardized, unambiguous way to denote tagged unions, rather than relying on a loosely defined convention.

Those are exactly the reasons that, at least imho, a class type that then followed in the footsteps of a mature IDL would be an ideal solution. I know that change would be significant though.

I'm not arguing that it's ideal to keep discriminator in its current form, in OpenAPI, forever. But I would like to see it replaced by a JSON Schema vocabulary [...]

Any idea if that is currently being contemplated for JSON Schema? I'd definitely be interested to learn more about that.

[...] that has the same level (at least) of clarity, consistency, and simplicity

fwiw, personally I would think this is a very low bar. In searching for more examples, I found a lot of posts asking unanswered questions about how to use discriminator, and my impression was that there doesn't seem to be much as far as documentation or complete examples, other than maybe one simple example which might be hard to translate into real-world use cases.

Mar 02 '20 20:03 mewalig

@mewalig we (the JSON Schema project) have enabled extensible vocabularies in the new draft. We are hoping that OpenAPI folks will take the lead on code generation proposals. Not necessarily the OpenAPI project itself (although I'd be happy with that), but the OpenAPI community. OAS is the primary driver of codegen. Also, there are a lot more of y'all than there are of us.

Mar 03 '20 18:03 handrews

In the hope that it will inform this discussion, I’ll describe how we (IBM) are using discriminators in our APIs and code generation tools.

Here’s an example use of discriminator in the API for the IBM Discovery service:

      "QueryAggregation": {
        "type": "object",
        "description": "An aggregation produced by  Discovery to analyze the input provided.",
        "discriminator": {
          "propertyName": "type",
          "mapping": {
            "histogram": "#/components/schemas/Histogram",
            "max": "#/components/schemas/Calculation",
            "min": "#/components/schemas/Calculation",
            "average": "#/components/schemas/Calculation",
            "sum": "#/components/schemas/Calculation",
            "unique_count": "#/components/schemas/Calculation",
            "term": "#/components/schemas/Term",
            "filter": "#/components/schemas/Filter",
            "nested": "#/components/schemas/Nested",
            "timeslice": "#/components/schemas/Timeslice",
            "top_hits": "#/components/schemas/TopHits"
          }
        },
        "properties": {
          "type": {
            "type": "string",
            "description": "The type of aggregation command used. For example: term, filter, max, min, etc."
          },

QueryAggregation is "class" of schemas that may be returned from a query that requests aggregation of results in various forms. The various aggregation types vary quite significantly, but notice that some types, e.g. "max", "min", "average", share a common schema. In this particular case, the "child" schemas are composed using allOfwith QueryAggregation as one element and then the specific properties of the child in a second element. E.g.

       "Calculation": {
        "allOf": [
          {
            "$ref": "#/components/schemas/QueryAggregation"
          },
          {
            "properties": {
              "field": {
                "type": "string",
                "description": "The field where the aggregation is located in the document."
              },
              "value": {
                "type": "number",
                "format": "double",
                "description": "Value of the aggregation."
              }
            }
          }
        ]
      },

Next I'll describe how this is used in our tooling. The first thing to say about our SDK generation tooling is that it does not do any validation based on JSON schema. Some may consider this heresy, but we use the schemas in the API def purely for modeling.

In Java and similar type-strict languages, the QueryAggregation schema is rendered as a public class (it is not abstract, but if the composition were "flipped" to use oneOf it would be). The "child" schemas are rendered as subclasses of QueryAggregation, e.g. Calculation.

The discriminator in the QueryAggregation schema is rendered into the QueryAggregation class as static metadata:

  protected static String discriminatorPropertyName = "type";
  protected static java.util.Map<String, Class<?>> discriminatorMapping;
  static {
    discriminatorMapping = new java.util.HashMap<>();
    discriminatorMapping.put("histogram", Histogram.class);
    discriminatorMapping.put("max", Calculation.class);
    discriminatorMapping.put("min", Calculation.class);
    discriminatorMapping.put("average", Calculation.class);
    discriminatorMapping.put("sum", Calculation.class);
    discriminatorMapping.put("unique_count", Calculation.class);
    discriminatorMapping.put("term", Term.class);
    discriminatorMapping.put("filter", Filter.class);
    discriminatorMapping.put("nested", Nested.class);
    discriminatorMapping.put("timeslice", Timeslice.class);
    discriminatorMapping.put("top_hits", TopHits.class);
  }

This metadata is used by our deserialization logic to trigger and guide the use of a custom TypeAdapter that is created in DiscriminatorBasedTypeAdapterFactory. The custom TypeAdapter uses the value of the discriminator to choose a concrete class, based on the discriminatorMapping metadata in the class, to be produced by the deserialization logic.

If there were no discriminator, the generated code would look very different. We would instead create a QueryAggregation class containing the union of all the properties of the child schemas, and the returned class would be an instance of this "generic" QueryAggregation class.

Sorry for the long post. I hope this has been clear and informative.

Mar 22 '20 17:03 mkistler

That is a great example. Is there any reason your generated code would (or should) be different for the equivalent JSON Schema (which I have attempted to generate in the below Case 2)? Obviously, the second one is much less compact, and I am not suggesting that it is better, or that anyone should have to use that instead of discriminator. Rather, I am wondering if discriminator might be more useful if treated (and documented) as optional syntactic sugar (which tools can simply be convert into straight JSON Schema, which in turn offers various other benefits such as validation tool support).

Case 1 (same as your example):

      "QueryAggregation": {
        "type": "object",
        "description": "An aggregation produced by  Discovery to analyze the input provided.",
        "discriminator": {
          "propertyName": "type",
          "mapping": {
            "histogram": "#/components/schemas/Histogram",
            "max": "#/components/schemas/Calculation",
            ...

Case 2:

      "QueryAggregation": {
        "type": "object",
        "description": "An aggregation produced by  Discovery to analyze the input provided.",
        "oneOf": [
          {
            "allOf": [
              {
                "required": ["type"],
                "properties": {
                  "type": { "enum": ["histogram"] }
                }
              },
              { "$ref": "#/components/schemas/Histogram" }
            ]
          },
          {
            "allOf": [
              {
                "required": ["type"],
                "properties": {
                  "type": { "enum": ["max"] }
                }
              },
              { "$ref": "#/components/schemas/Calculation" }
            ]
          }
        ]
        ...

Mar 22 '20 20:03 mewalig

@mewalig While case 2 is appealing due to the removal of the need for the extra discriminator keyword, it does make codegen a bit more challenging. The appearance of a discriminator keyword is a signal that there is a derived type scenario. It is possible that a oneOf keyword might signal the same thing, but it is a bit more opaque to recognize that the "constant" enum is the discriminator.
It is interesting to consider that if validation can short-circuit, then it might be possible to efficiently use schema validation to identity the appropriate schema rather than depending on a discriminator mapping.

Mar 22 '20 22:03 darrelmiller

Thanks @darrelmiller.

it is a bit more opaque to recognize that the "constant" enum is the discriminator

The reason I asked is not to support the case for removal of discriminator-- I do see value of having it, esp in the context of the example, and I am not suggesting that codegen should reverse this process where a discriminator is not present (as I mentioned, I am not suggesting that Case 2 is better, or that anyone should have to use that instead of discriminator). This is somewhat different from my position at the start of this thread where I advocated removing that keyword altogether.

Rather, I'm wondering if discriminator would be more useful if formally defined as syntactic sugar which can be converted (one-way, not two-way) to functionally-equivalent JSON Schema. Doing so would retain the benefits of having discriminator for codegen and clarity purposes, while also adding further benefits-- such as straightforward validation without the need for retooling-- that could be realized through the use of generic JSON Schema tools that do not support discriminator.

Mar 22 '20 22:03 mewalig

Agreed with @darrelmiller that using the "constant" enum to substitute for the discriminator is asking a bit much for code generation tools. Of course it is possible to implement the deep analysis of the various schema to tease out what distinguishes one from another, but then where does it stop? Would we expect non-overlapping ranges of min and max integers to be used? Or min/max length of a string property? Or non-intersecting patterns of a string property? I mean, you could really go crazy here.

In my mind, this boils down to the differing perspectives of JSON schema as a "validation" mechanism vs as a "data modeling" mechanism. discriminator is not needed for "validation", but is mighty handy for "data modeling". As pointed out by @sm-Fifteen, discriminator is a concise way to express a "tagged union", which is a very common programming construct.

Mar 23 '20 02:03 mkistler

I'm not sure how to be more clear about this but the question was not intended to suggest the use of anything as a substitute for the discriminator keyword.

What I am suggesting is that maybe discriminator should not change other than to be formally defined as syntactic sugar. if the answer to the question is "Yes, they would be the same", then at least for this use case, such a definition would have no impact-- which would be a good thing. if the answer is "No, they would be different", then it might suggest a problem that would be caused by such a definition.

I'm just trying to have my cake and eat it, and share it with everyone else with no downside.

Mar 23 '20 03:03 mewalig

@mewalig I did not mean to be argumentative -- sorry if it came across that way. It sounds like we are generally in agreement. But also realize that we are commenting on an issue titled "Deprecate discriminator?" 😄 .

Mar 23 '20 11:03 mkistler

@mewalig, from the point of view of a JSON Schema validator, it might be syntactic sugar, assuming that the discriminator translates into your oneOf/allOf construct with enum or const on the discriminator property.

From the point of view of a code generator, I think it's more than syntactic sugar, because the discriminator conveys higher-order information that the oneOf construct does not. It tells the code generator that the type should be treated as a tagged union, and that allows the code generator to create the class hierarchy and serialization/deserialization logic in a straightforward way, without relying on a JSON Schema validator to determine the type on deserialization.

The oneOf construct in your example doesn't convey this information. First, it's not clear that the schema author intends or expects the code generator to create a class hierarchy. As @mkistler said:

If there were no discriminator, the generated code would look very different. We would instead create a QueryAggregation class containing the union of all the properties of the child schemas, and the returned class would be an instance of this "generic" QueryAggregation class.

And even if the code generator did create a class hierarchy, there's no practical, failsafe way for the generated code to distinguish subtypes on deserialization unless it uses a JSON schema validator or duplicates the logic of a validator.

The discriminator says "you only need to look at this designated property to determine the subtype." The oneOf example makes no such guarantee. There could be any number of different constraints in the oneOf subschemas, and the generated deserializer would have to be prepared to check the request or response message against all of those constraints in order to determine the subtype. The only practical way to do that is with a schema validator.

Whenever you create a DSL, there is a kind of side effect when you introduce syntactic sugar. The syntactic sugar is usually added to the language because it represents a common pattern or idiom, which is motivated by a corresponding intent. So there's some higher-order information, a new abstraction, that is introduced as a side effect of the syntactic sugar.

The language spec can say, "This higher-level, sugary construct must be translated to this lower-level equivalent, and must be treated exactly the same way." And in that case, anything that works with the language should heed that advice, and should not ascribe any special meaning to the higher-level sugar, distinct from the lower-level construct. But I have seen cases where language processors ignore this rule, interpreting the higher-level construct in a special way. It's tempting to do so in cases where the higher-level construct implies a certain intent that the lower-level construct does not.

In the case of discriminator, the OpenAPI authors recognized from the outset that this is a special case, with special meaning, translating into specialized output from code generators, runtime libraries, etc. The OpenAPI spec doesn't say "treat this as syntactic sugar," because that would defeat the purpose. The use of discriminator carries a certain meaning, and language processors are allowed, even encouraged, to make use of that.

Mar 23 '20 11:03 tedepstein

@mewalig , about this:

Rather, I'm wondering if discriminator would be more useful if formally defined as syntactic sugar which can be converted (one-way, not two-way) to functionally-equivalent JSON Schema. Doing so would retain the benefits of having discriminator for codegen and clarity purposes, while also adding further benefits-- such as straightforward validation without the need for retooling-- that could be realized through the use of generic JSON Schema tools that do not support discriminator.

Yes, I do think it's useful to provide a suggested one-way translation from discriminator to standard JSON Schema. There is some information loss in that translation, which is why IMO it would be misleading to call it "syntactic sugar."

But defining that translation could help to clarify how discriminator is supposed to work, specifically with regard to validation. And it could be used directly as a spec for an OAS schema to JSON schema conversion, which could, in turn, be used for validation.

Mar 23 '20 12:03 tedepstein

@mkistler thank you-- totally understand, esp as my view has changed through the course of the discussion-- absolutely no apology needed, and apologies in turn if I seemed argumentative. I think all of your input is / has been extremely helpful to the conversation and much appreciate it, as imho real-world business examples are invaluable in guiding this type of conversation.

@tedepstein thank you for the explanation which is super helpful in tying together the various issues and implications.

Yes, I do think it's useful to provide a suggested one-way translation from discriminator to standard JSON Schema. There is some information loss in that translation, which is why IMO it would be misleading to call it "syntactic sugar."

That makes sense. A suggested translation could provide benefit in common/clear/simple cases while still leaving wiggle room for other cases, while also providing a well-defined sub-scope for tools to add incremental support where there may be none today.

So that I know what to call it, since that kind of suggestion would not be "syntactic sugar" per se, what is it? Is there an existing analogous construct in other meta-languages, such as perhaps Typescript's discriminated / tagged unions?

Mar 23 '20 13:03 mewalig