A Canonical and Extended Forms for OpenAPI Specifications

Open pjmolina opened this issue 5 years ago • 12 comments

A Canonical and Extended Forms for OpenAPI Specifications

Following the discussion on last TSC and talking about Traits, Overlays and/or Mixings, anyone can agreed these features are strongly oriented for extensibility.

This very good for API authors and edit tools.
On the other hand, it adds complexity on current implementations to correctly resolve them.

To keep things as simple as possible for implementers, I want to propose the idea of having a two levels of the specification that can be define as follows:

Let's call Extended Form version of OpenAPI documents to the ones containing Traits, Overlays, Mixing, $refs, or any other macro-like indirection functionality (quite useful to avoid repetition and be as concise as possible).
On the contrary, let's call Canonical Form an OpenAPI document with all these features fully resolved. All indirections coming from overlays, traits, mixings and $refs are resolved to have a single tree in a single file/document.

This will allow tools implementers to focus on:

Resolve an Extended Spec (1) into a Canonical Form (2).
Tools for code-gen (or validation) can take the Canonical Form (2) and do its job without any knowledge of Traits, Overlays, Mixings neither $refs.

This approach enable to:

Define a level of compliance with Canonical Form for tools.
Define another one for tooling translating Extended Form into Canonical Form.

This can help tools implementers to embrace OpenAPI 3.0 faster when targeting (2) or (1).

Divide and conquer strategy, that's it. What do you think?

Feb 28 '19 18:02 pjmolina

@pjmolina , I think it's a good idea. We have already discussed the idea of making overlays a separate specification, and I think this is the prevailing direction of the TSC.

But even today, we find that there are code generators, documentation formats, test consoles, etc. that do not correctly handle some features of OpenAPI 2.0 and 3.0. Common stumbling blocks include:

external $ref properties
"cascading" properties like security requirements, parameters, and others that can be specified at a high level (e.g. in the OpenAPI Object) and in some cases may be overridden or augmented in lower-level constructs, like path items, methods, requests or responses.

We have our own KaiZen OpenAPI Normalizer to smooth out these problems for reliable downstream processing. It's not a trivial operation, and functionality like this is only going to get more important as we start adding traits, overlays, alternative schemas, and other features.

Having Extended and Canonical forms more clearly defines the role that tools like Normalizer can fulfill, as translators from Extended to Canonical form. And it removes a significant barrier to adoption of new OpenAPI versions.

Most OpenAPI usage is read-only. Consumers of OpenAPI only need to read and comprehend the API document; they won't care about how it has been composed internally. If we can separate roles, so that OpenAPI consumers don't have to be responsible for piecing together the API description from its constituent parts, I think that would be a big win.

Feb 28 '19 20:02 tedepstein

You nailed it @tedepstein ! It looks like we have experience the same kind of pain. ;-)

Feb 28 '19 21:02 pjmolina

How will the canonical form represent circular schemas in a document if all $refs are to have been resolved? JSON has no mechanism to support this, and we ban the use of the related YAML features.

Mar 01 '19 09:03 MikeRalphson

Fair point @MikeRalphson :

External $refs can be resolved (imported) to be local.
Local $ref may be considered to be inside the Canonical Form if needed (this usage is structural and not an expansion macro for reuse).

Example: Recursive and circular references in Schema Types.

Any other uses cases where circular refs could be a problem?

Mar 01 '19 09:03 pjmolina

@pjmolina , @MikeRalphson , here are some excerpts from the Normalizer docs that explain how this works:

When the normalizer encounters any reference, there are two ways it may process the reference:

Inline The normalizer retrieves the referenced value (e.g. the Pet schema definition object) and replaces the reference itself with that value.

Localize The normalizer first adds the referenced object to the normalized spec that it is creating, if it is not already present, and then replaces the reference with a local reference to that object. So in the external reference example shown above, the Pet schema definition would appear directly in the OpenAPI spec produced by the normalizer, and references that were formerly external references would become local references.

(snip)

Recursive References It is possible to set up recursive schema definitions in OpenAPI specs, through the use of references. For example, consider the following schema:
      matriarch:
        $ref: "#/components/schemas/Person"
...

components:
  schemas:
    Person:
      type: object
      properties:
        name:
          type: string
        children:
           $ref: "#/components/schemas/People"  
    People:
      type: array
      items:
        $ref: "#/components/schemas/Person"
The Person schema has a children property of type People, and the People schema defines an array of Person objects.

Naively attempting to inline a reference to a Person object would lead to a never-ending expansion...

To handle recursive references encountered during inlining, the normalizer stops inlining whenever a reference is encountered that is fully contained within another (inlined) instance of the referenced object. That recursive reference is localized rather than being inlined.

In the above example, we would end up with something like this:

partially-inlined
    matriarch:
      type: object                                      
        properties:
      name:
        type: string
      children:
        type: array
          items:
            $ref: "#/components/schemas/Person"        
...

components:
  schemas:
    Person:
      type: object
      properties:
        name:
          type: string
        children:
          type: array
          items:
            $ref: "#/components/schemas/Person"
...
Here we see:

that the top-level reference to Person as the type of the matriarchproperty was inlined;

that the recursive reference to Person encountered while performing this inlining has been localized;

that the Person schema itself was subjected to inlining, with localization of its recursive reference;

There are other details of the algorithm for handling name clashes. There's also a somewhat misguided distinction between "conforming" vs. "non-conforming" references, which we're planning to eliminate in a future revision. So I would not propose the KaiZen Normalizer documentation, in its current form, as a baseline spec for Canonical Form.

But depending on our goals for Canonical Form, we may not need to specify the algorithm to this level of detail. Maybe it's sufficient to say that Canonical Form just means:

There are no external references, traits or overlays.
All cascading properties have been expanded down to their respective leaf levels.
All default values are explicitly specified.

Different processors could accomplish this in different ways, and Canonical Form does not guarantee that the output will always be exactly the same, regardless of which processor you use.

OpenAPI consumers would still need to be able to resolve local references, expressed as JSON pointers within the document. And they would still need to deal with the possibility of recursive references. But they wouldn't need to deal with those other levels of complexity or general fussiness in the OpenAPI spec.

Mar 01 '19 12:03 tedepstein

The more I think about this, the more I'm convinced that it's critical to the success of the OpenAPI ecosystem. I would go so far as to say that we should not introduce traits, a.k.a. mixins (#1843), into the OpenAPI spec unless we also define a canonical or simplified form.

Anecdotal evidence: OpenAPI 3.0 adoption took much longer than we hoped. Developers were waiting for tools and platform support; tool and platform providers were waiting for demand to reach critical mass; and there was no "killer app" to drive the ecosystem to OAS v3.

You could argue that OpenAPI 3.0 was different, because 3.0-to-2.0 conversions, which might have facilitated adoption by OAS consumers, were inherently lossy and therefore not a practical solution. By contrast, traits can be resolved by a preprocessor with no information loss, and we could just let the open source community build those preprocessors.

You could also argue that, whatever complexities might exist in OpenAPI, we can leave it to the open source community to build preprocessors like Kaizen OpenAPI Normalizer and others. We don't need to formalize it in the spec.

But I think these arguments fail to address the economics of the situation.

OpenAPI consumers are a broad category that includes documentation formats, test consoles, code generators, API gateways and API management platforms, among others. OpenAPI producers are a much smaller category that includes editors, code-first frameworks, design tools, and maybe a few others.

If I'm an OpenAPI consumer looking at a new release of the OpenAPI spec, my goal is to support that new release and advertise that support, with minimum effort. If it's difficult for me to support a new feature like traits (and it will be difficult), I have a few options:

Bite the bullet and write the code to support traits.
Advertise half-assed support for OpenAPI 3.1... without traits. If someone wants to use my service with a "traitful" OpenAPI document, it's up to them to pre-process and send me a traitless OAS 3.1 document.
Look for open source processors to help by resolving the traits, maybe even converting 3.1 to 3.0 with some information loss.

The first two options are obviously not very attractive. The third option might seem fine. But consider what this means:

No one is telling me that there's this middle path available to me, and no one is defining, in an authoritative way, what that middle path should look like. I have to discover it on my own.
I have to identify my particular pain points (traits being one of them), put my trust in a third-party processor, and build dependencies on that processor into my implementation.
I cannot advertise support for OpenAPI 3.1 until I have gotten comfortable (enough) with these decisions and done the actual integration work on my end.

That's a big enough barrier to almost guarantee slow adoption of OpenAPI 3.1.

Now, if OpenAPI 3.1 officially defines a Canonical Form, even in very simple terms, it changes the economics pretty dramatically for me as an OpenAPI consumer:

I now have a target for OpenAPI 3.1 support that is much easier to hit. It's clearly named, clearly specified, and has the official OpenAPI stamp of approval.
It's now clear what a canonicalizer is supposed to do. So if I want to build in support for full/extended OpenAPI 3.1, I can expect (soon enough) to find at least a few good ones in the open source community.
If I don't want to build in support for extended form, I am much more comfortable leaving it to my users to canonicalize. I can still advertise support for OpenAPI 3.1 canonical form, and it shouldn't be hard for users of my service to canonicalize inputs themselves.
Over time, we should expect this to get better. API providers will likely move towards publishing their OpenAPI specs in canonical form, and the 3.1 spec can encourage this practice. Canonicalizers will be built into OpenAPI editors, code-first processing pipelines, and other toolchains. In the best case scenario, responsibility for canonicalization moves naturally from consumers to producers, and the whole issue of canonicalization mostly disappears below the water line.

Not that I've heard anyone raise a strong objection to this yet. But I think this is a simple and powerful way to reduce friction in the OpenAPI ecosystem.

Mar 01 '19 13:03 tedepstein

Different processors could accomplish this in different ways, and Canonical Form does not guarantee that the output will always be exactly the same, regardless of which processor you use.

I believe we would be creating problems for ourselves and tooling authors if we did not specify (with examples) exactly how the resolution of overlays, traits/mixins and $refs should be resolved, to a truly canonical form whereby each conforming tool produces exactly the same output when canonicalizing the same input. See for example https://en.wikipedia.org/wiki/Canonical_XML

Mar 04 '19 15:03 MikeRalphson

True, the semantics of the Extended Form should generate a unique Canonical Form. Moreover, we can provide a Test Suite to illustrate the expected input + expected output.

Mar 04 '19 16:03 pjmolina

If the consensus is that we should go for this level of specificity, I don't object.

My position is that OpenAPI is already in need a simplified or normalized form, whether or not it's strict enough to be called a Canonical Form. And that we should not introduce traits unless we also provide this.

If a simplified form is on the critical path to traits, as I believe it should be, I just want to make sure we have enough time to do it. I would rather have a "simplified form" done in time for a 3.1 release than a "canonical form" still in progress.

Mar 04 '19 16:03 tedepstein

@tedepstein "If we can separate roles, so that OpenAPI consumers don't have to be responsible for piecing together the API description from its constituent parts, I think that would be a big win."

Thats actually false. Roles cannot be second or third in line in line as a reference point. They are the first/second point of reference for the endpoint so you can be in compliance with API3:2019(https://apisecurity.io/encyclopedia/content/owasp/api3-excessive-data-exposure.htm) and API6:2019(https://apisecurity.io/encyclopedia/content/owasp/api6-mass-assignment.htm)

The proper line of reference should be:

ENDPOINT > ROLE > REQUEST DATA ENDPOINT > ROLE > RESPONSE DATA

Like so:

"user/update": {
    ...,
    "REQUEST": {
        "permitAll":["username","password","email"],
        "ROLE_ADMIN":["id"]
    },
    "RESPONSE": {
	"permitAll":["id","version"]
    }
}
```		    	

It is impossible to rely on separate security mechanism to do this as it is not making this check in association with the endpoint so this check is being missed entirely at gateway where security is for a majority of applications that rely on OpenAPI.

**So OpenAPI is 100% vulnerable to 2 of 10 high security issues**

Jun 20 '21 19:06 orubel

Overlay has now been proposed, which supports traits. I think this answers questions raised in this issue, please comment if there is more to resolve here.

May 25 '22 01:05 kscheirer

Nice of you to minimize the fact I pointed a security risk but the security risk still exists.

May 25 '22 15:05 orubel

OpenAPI-Specification OpenAPI-Specification copied to clipboard

A Canonical and Extended Forms for OpenAPI Specifications

A Canonical and Extended Forms for OpenAPI Specifications

OpenAPI-Specification
OpenAPI-Specification copied to clipboard