spec icon indicating copy to clipboard operation
spec copied to clipboard

feat: introduce new schema referencing standard

Open jonaslagoni opened this issue 1 year ago • 21 comments


Abstract

The current reference object is no longer sufficient as we need to support non-JSON references, i.e. we can no longer solely rely on JSON Reference. The main problem becomes that AsyncAPI is defined with JSON/YAML and you are referencing something that does not conform to this format. Hence making it though to support. This new standard aims to solve this problem.


Description

This proposal is part of a larger issue, that have been highlighted in multiple GitHub issues, and this PR extends the work from @magicmatatjahu in https://github.com/asyncapi/spec/pull/797, which this PR is a followup for (to allow referencing non-JSON data structures such as Protobuf, XSD, Avro, FlatBuffer schemas, EDI, COBOL copybook, and the inevitable "cool new format").

This issue is a two parter, cause without tooling, this new standard is use-less.

Requirements for the standard:

  • Should always produce a valid JSON document
  • Should define the behavior for referencing non-JSON data in a JSON.
  • Should define the behavior for referencing JSON data
  • Should define the behavior for referencing JSON data that also have reference behavior, and how they interconnect / or not
  • Should define the behavior of nested schemas within the same file

Remaining issues/tasks to resolve

  • [ ] Figure out if we "solve" or define how cross references accurate should/not work together
  • [ ] Figure out if we want to allow JSON pointer compatibility in some format, see https://github.com/asyncapi/spec/issues/216#issuecomment-510147873
  • [ ] Figure out how linking should work for local references
  • [ ] Figure out how linking should work for relative references
  • [ ] Should we distinct keywords for references and links instead of just $ref?
  • [ ] Figure out how to handle resolving non-JSON data into JSON. Is resolving it to content enough?

FAQ

Some quick questions and answers about the current state of the standard.

  • How can I use fragments in non-JSON data? (https://github.com/asyncapi/spec/issues/216)

You cannot.

If it's non-JSON, then fragment MUST be ignored.

  • How can parsers use this?

Parsers can, based on the schema format determine whether it has the capability to parse and interpret the reference.


Related issue(s):

  • https://github.com/asyncapi/spec/issues/622

Moves schema format into a "schema object" instead of message containing schema and schemaFormat. Introducing defaultSchemaFormat in root AsyncAPI object. schemaFormat in message object is deprecated/removed. Does look for supporting nested references, i.e. in avro through schemaRef that is instead of using URI fragments.

Lift the idea of schemaParse, which does not make sense as we can look at the schemaFormat and it's up to the implementation if it supports it.

Lift the idea of $remoteRef or x-remote-ref, but that does not make sense either in this setup.

  • https://github.com/asyncapi/spec/issues/163

Nothing specific, want to reference Flatbuffers.

  • https://github.com/asyncapi/spec/issues/624

Proposes remoteReference to be unparsable by the parsers and left for generators. Lift the idea of parse and remote. Highlights some requirements:

  1. I want to have entire .xsd imported into AsyncAPI, as a string
  2. I want to point from AsyncAPI to a schema registry/file, not bringing in the whole thing
  3. Provide a pointer to a particular element
  4. Pointer to a particular element if importing the schema
  • https://github.com/asyncapi/spec/issues/216

Require clearification how you can utilize JSON Reference and point to non-JSON data. Require clearification on the following:

  1. Clarify how $ref can be applied to YAML data structures
  2. Refer to an existing mechanism used to translate YAML to JSON
  3. Clarify referencing mechanism for any non-JSON and non-YAML schema languages

A list of possible URI fragments that can be supported for non-JSON data.

  • https://github.com/asyncapi/spec/issues/694

Introduce data format bindings, that aim to solve an issue where specific arguments is needed for the schema format. schemaOptions being proposed along side https://github.com/asyncapi/spec/issues/622

  • https://github.com/asyncapi/spec/issues/656

Problem with no clear definition how AsyncAPI parsers can and should handle different formatSchema.

jonaslagoni avatar Aug 04 '22 14:08 jonaslagoni

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarcloud[bot] avatar Aug 04 '22 14:08 sonarcloud[bot]

While the "JSON" in "JSON Reference" implies that it would only work for JSON-stored data, that's not actually true as it could very easily be used to reference data in any structured format. #/people/whitlockjc/email itself is no more bound to JSON as it is to anything else, just a hierarchical combination of keys and indices. 🤷‍♂️

Do you have any examples where using a JSON Reference is not sufficient for another structured data format, like Protobuf? I'm not saying that JSON References are the end-all-be-all, but unless you see a limitation I'm not aware, they're pretty powerful and not as language specific as its name may imply.

whitlockjc avatar Aug 08 '22 16:08 whitlockjc

I guess I could see where writing a library that was content-aware to swap between resolving in structured language X to structured language Y could be difficult. But I could also see that as being application/domain specific, meaning you could just drive your JSON Reference library manually.

whitlockjc avatar Aug 08 '22 16:08 whitlockjc

While the "JSON" in "JSON Reference" implies that it would only work for JSON-stored data, that's not actually true as it could very easily be used to reference data in any structured format. #/people/whitlockjc/email itself is no more bound to JSON as it is to anything else, just a hierarchical combination of keys and indices. 🤷‍♂️

Do you have any examples where using a JSON Reference is not sufficient for another structured data format, like Protobuf? I'm not saying that JSON References are the end-all-be-all, but unless you see a limitation I'm not aware, they're pretty powerful and not as language specific as its name may imply.

I would love nothing more if we could stay with JSON Reference or if there exist another standard (xkcd) 😄 Also why I made this draft to trigger those very discussions whether it's even needed, cause just because I see it does not mean that's how it is 😄

So here are my issues with the current standard, and these go back to the requirement for the change:

  1. It does not define the resolution behavior of non-JSON data

As JSON pointer, does not define any logic for how to resolve the referenced resource that is non-JSON. The question becomes how you handle it within a JSON document. Do you load the resource and place it into a content property that is a string? Do you transform it to JSON using a standard? Or a custom behavior?

  1. It does not define a way for implementations to know which type of reference it is

I don't read anywhere within JSON Reference that it defines how the implementation can know the type of the referenced resource. How can implementations be content-aware about how to handle the referenced resources?

  1. It does not allow referencing non-JSON data

As the standard is solely targeted JSON data, there is a lot of implies within using the standard for non-JSON data. Even though you probably could force it to comply in some way. As you say "just a hierarchical combination of keys and indices".

  1. It does not define the resolution behavior in multi-spec environments

For example, while you can use $ref in AsyncAPI, you can also do this in OpenAPI and JSON Schema, which have very different resolution behavior. So when you nest a JSON Schema document within AsyncAPI, how should those references be handled?

Related issues: https://github.com/asyncapi/spec/issues/655, https://github.com/asyncapi/spec/issues/656

  1. It does not define how to handle relative references in non-JSON data (through URI fragments)

As you highlight, technically, yes you can utilize fragments, but how exactly? See https://github.com/asyncapi/spec/issues/216

Those are just the issues I can remember at the moment. One of the core reasons behind authoring it as a standard is that regardless of whether we do it, we need to build tooling to handle references within a multi-spec document. So having this as a standard just makes sense to me 🤷

@whitlockjc what would your suggestion be instead, in terms of how we should move forward to support this requirement of referencing non-JSON data? 🤔

jonaslagoni avatar Aug 08 '22 16:08 jonaslagoni

It does not define the resolution behavior of non-JSON data

I'm not sure JSON Reference cares about the data being resolved, and in my opinion it shouldn't care. Resolution only cares with resolving something, it doesn't care what that something is. And in the case of JSON References, so long as the data is structured, it seems resolvable to me. (I do wonder how one might handle XML though, where you might need to resolve properties vs. nested elements. But I'm sure a simple convention would suffice instead of creating yet another standard or approach.)

It does not define a way for implementations to know which type of reference it is

Much like the first, the standard doesn't mention this because it doesn't care, and I question whether it should. If you know enough about where the data is, you should be able to reason about what kind of data is there. To me, this is highly specific to the application using the data and I'm not sure if adding complexity so that tooling can "guess" is worth it.

It does not allow referencing non-JSON data

I'm not sure I agree with this because a JSON Pointer is nothing more than a URI, a hierarchical series of steps to find something in a structured document. The same JSON Pointer that could locate something in JSON could be used to find data in a non-JSON data format so long as it's structured. For example, #/people/Jeremy/email/0 could work for HTML, XML, any language equivalent of a structure object, a DAG or anything else that is structured.

It does not define the resolution behavior in multi-spec environments

I'm not sure I have an opinion on this, primarily because each format owns how references are supported/treated. For example, OpenAPI started out with bastardized support for JSON References, then started using native JSON References and now uses JSON Schema's resolution of JSON References which is custom/unique to JSON Schema. But where I'm slightly confused is what you mean by a "multi-spec environment", what is that?

If a "multi-spec environment" just refers to storing non-AsyncAPI documents/fragments within the AsyncAPI document, I don't think there is a problem. Reason being is that there is no reason why a JSON Reference outside of the nested spec would need to resolve into the nested spec, nor a JSON Reference in the nested spec needing to resolve outside of itself. The content of the nested spec is immaterial to AsyncAPI, and likely just used by some external tooling which itself would need to be aware of its content. For example, in https://github.com/asyncapi/bindings/pull/141 I propose a way to document a Cloud Pub/Sub topology using AsyncAPI. If I were to store Protobuf into the Message Binding in hopes of having some tooling to do client/server validation based on the Protobuf, AsyncAPI shouldn't care about it and for the client/server tooling to use it, it need to be fully aware of its content so any resolution it does would play by its own rules.

It does not define how to handle relative references in non-JSON data (through URI fragments)

Whether the data is JSON or not, if it's structured the rules are the same. Fragments will address hierarchically from the root of the enclosing document. I really don't think JSON vs. non-JSON changes much because a structured document is hierarchical and plays by the same rules. (Like I mentioned earlier, I could see a case for XML and similar languages to need some sort of convention to differentiate between element properties and nested child elements.)

Summary

To me, so long as the data is structured, JSON References allows a very simple way to define where something is in relationship to a document/structure so that you can resolve it. Sure, mixed data formats adds some complexity and there could even be some sort of edge cases for structured data that has data within a container (XML properties) but since JSON References are really just dict/map/object/... with a singular $ref key/property that is itself a URI whose resolution rules are pretty clear, I don't see the need for something more complex or necessarily a short coming in JSON References.

I will look more into the examples you linked to where we might need something that I'm just not seeing.

whitlockjc avatar Aug 08 '22 17:08 whitlockjc

TL;DR:

  • AsyncAPI can define the usage of a reference in terms of the result of applying the reference target (regardless of what kind of schema it is) rather than in terms of replacing a JSON object with the expected-to-be-JSON target. This is a delegation, not a replacement or a merge (merge may sound tempting, but it gets horrifically messy very fast)
  • The fragment syntax used in the reference is determined by the media type of the representation identified by the non-fragment portion of the URI, not by the usage of the URI.
  • However, AsyncAPI can require that such representations be mapped into a media type that supports JSON Pointer fragments (notably, application/json technically does not support JSON Pointer fragments — you can use non-fragment JSON Pointers with it, but technically not JSON Pointer URI fragments; for that, you need some sort of application/<something>+json media type).

Apologies if I missed something in this lengthy issue, but I wanted to raise a point that has been sporadically important (and often overlooked) within many JSON Schema / JSON Pointer-related discussion about URIs and fragments:

Some quick questions and answers about the current state of the standard.

How can I use fragments in non-JSON data? (https://github.com/asyncapi/spec/issues/216)

You cannot.

If it's non-JSON, then fragment MUST be ignored.

Technically, even with application/json you MUST ignore JSON Pointer fragments, per RFC 6901 §6:

Note that a given media type needs to specify JSON Pointer as its fragment identifier syntax explicitly (usually, in its registration [RFC6838]). That is, just because a document is JSON does not imply that JSON Pointer can be used as its fragment identifier syntax. In particular, the fragment identifier syntax for application/json is not JSON Pointer.

(emphasis added)

Fragment resolution is controlled by the representation's media type.

This is why we (the JSON Schema project) added application/schema-instance+json, which is just application/json with JSON Pointer fragments and a "schema" media type parameter added. application/schema+json also supports both JSON Pointer and plain-name fragments. We are working on getting these media types properly registered so that they can be used more broadly. I haven't looked at OpenAPI's media type details, but I assume they say something about JSON Pointer fragments and/or deferring to the JSON Schema spec for Schema Objects.

So, whether you start with application/json or some vastly different media type, technically you first need to map that representation into a media type that supports JSON Schema fragments in order to use URIs with those types of fragments. For application/json to application/schema-instance+json, this is trivial. For YAML, as long as it is the JSON-compatible subset it is trivial, but otherwise more challenging (that same media type project is also working on getting application/yaml finally registered).

For example, while you can use $ref in AsyncAPI, you can also do this in OpenAPI and JSON Schema, which have very different resolution behavior. So when you nest a JSON Schema document within AsyncAPI, how should those references be handled?

It does not define how to handle relative references in non-JSON data (through URI fragments) As you highlight, technically, yes you can utilize fragments, but how exactly? See https://github.com/asyncapi/spec/issues/216

There are several parts of resolving a reference which are governed by different mechanisms, some of which can't be changed by downstream standards.

  1. Determining the full URI of the reference, governed by RFC 3986 §5 URI-reference resolution against a base URI, which first involves establishing a base URI through the following steps in order (§-numbers from RFC 3986):
    1. §5.1.1 Base URI embedded in content (governed by the media type in question, which may, as OpenAPI's media type does, defer to an embedded media type's rules)
    2. §5.1.2 Base URI from encapsulating entity (governed by the encapsulating entity's rules, so in the case of OpenAPI, an embedded schema that does not declare a full (with URI scheme) base URI with $id falls back to the base URI for the OpenAPI document
    3. §5.1.3 Base URI from the retrieval URI (if the encapsulating entity, if any, does not establish a base URI, the URI from which it is retrieved becomes the base URI)
    4. §5.1.4 Default base URI (application-dependent)
  2. Resolve the URI to a representation
    1. Look up the resource by the URI's non-fragment part either in a local cache or through a network operation
    2. Apply the fragment to the representation based on media type rules
      1. If the representation's media type is known, then its rules govern fragment interpretation
      2. If the media type is unknown, or if it does not support JSON Pointer fragments, then it (technically) first needs to be mapped into a representation that does
  3. Make use of the data found through the URI, for which I know of three alternatives:
    • JSON Reference (inlcuding JSON Schema draft-04): replace the entire containing JSON object with the reference result
    • JSON Schema draft-06 & draft-07: replace the entire containing JSON Schema object with the reference result
    • JSON Schema draft 2019-09 & 2020-12: Use the reference result as a schema to be applied to the same instance as the containing schema

The modern (2019-09+) JSON Schema behavior can be generalized to "use the result in the same way as the containing object is used", which is essentially delegation. That allows the context to determine the behavior of any adjacent fields in the JSON object. For modern JSON Schema, a reference behaves like any other keyword that produces schema results.

***This means that if AsyncAPI takes a similar approach to modern JSON Schema, then you can reference other schemas as the behavior is defined in terms of the schema result, not the schema syntax. ***

As far as the base URI, in practice, an embedded JSON Schema resolves references based on the nearest same-object or parent-object $id, and if none exist, against the base URI of the encapsulating entity (e.g. the AsyncAPI or OpenAPI document's base URI). This means that for OpenAPI 3.1, if you have a $id in your schema, you can reference #/$defs/whatever within that schema object ($defs as a sibling to $id), but you can't reference #/components/schemas/whatever in the containing OAS file without instead doing something like https://example.com/oasfile#/components/schemas/whatever

$ref outside of a schema object in OAS 3.1 always uses the OAS file's base URI. I believe they opted to retain the "replace the containing object" behavior for those references, but you could consider "apply the target in addition to whatever else is in the containing object" approach- what that would mean would be context-specific, obviously.

I don't read anywhere within JSON Reference that it defines how the implementation can know the type of the referenced resource. How can implementations be content-aware about how to handle the referenced resources?

They are not, it is the responsibility of whoever fetched the resource to know the media type and how to interpret fragments based on that. This means that the application that fetched the resource can also map/reinterpret the resource into a media type that supports the desired fragment behavior. This is the step where AsyncAPI can define behavior, such as how to map XML into a JSON Pointer-friendly media type/data model.

handrews avatar Aug 21 '22 18:08 handrews

As mentioned on the v3 spec call, I'd be interested in the tooling implications of this. If developers are trying to parse an AsyncAPI with a standard json parsing lib, what is the behavior?

jessemenning avatar Sep 15 '22 12:09 jessemenning

If developers are trying to parse an AsyncAPI with a standard json parsing lib, what is the behavior?

That is a totally good point. Most JSON parsers don't support JSON Reference by default. However, tools like json-schema-ref-parser does. It won't support this new referencing standard for non-JSON files, but maybe we could build a parser (also called plugins) for asyncapi files.

Otherwise, we would need to build tooling around this.

cc @jonaslagoni

smoya avatar Sep 15 '22 15:09 smoya

I've actually thought of enabling support for this in json-refs via an option, like an experimental feature.

whitlockjc avatar Sep 15 '22 18:09 whitlockjc

JSON References are really just a specialized structure, an object containing a sole $ref property whose value is a URI. URIs are pretty dang good at pointing to where something is, regardless of what type it is, so to me, we have two issues:

  1. How do we handle the URI fragment for non-JSON "referrant documents"
  2. How do we serialize the resolved value for non-JSON "referrant documents"

For issue 1, JSON References says that the fragment portion of the URI is a JSON Pointer and that works well for JSON documents but there is nothing to explain how to handle non-JSON documents. My opinion is that JSON Pointers work for nearly all structured data without issue or modification. Where JSON Pointers fall apart are for languages where each "node" in the data structure has information within it, like HTML/XML attributes/properties, and for non-structured data (like protobuf). I feel like our only options here are to create a JSON Pointer replacement that is language agnostic, or we use some convention to fill in the gaps. (For example, we could easily use a @ prefix for cases where the node is not part of the data structure hierarchy but is a value within a node of the structure, like with HTML/XML. Example: #/XMLRoot/XMLChild/@attr)

For issue 2, this is where I have less of an opinion but I could definitely see a JSON representation generated from almost anything, including Protobuf. The good news is that for many of the types we know we want to work with, the problem could already be solved with off the shelve tooling:

All this being said, I would like to know exactly what is wrong with what we have now, and what we're trying to solve. Reason being is because I don't think JSON References as they exist now are far off the mark of what we need, other than the lack of JSON Pointers working for everything we want.

whitlockjc avatar Sep 16 '22 14:09 whitlockjc

I realize the "JSON" in "JSON Pointer" and "JSON Reference" implies that the containing document is JSON and the referrant document is JSON. But the structure and syntax of these items are by no means tied to just JSON and there is no reason we can't use what's there as-is regardless of the syntax of the containing/referrant documents. That's the point I'm making.

whitlockjc avatar Sep 16 '22 14:09 whitlockjc

The more I think about it, the less I care about it. "JSON *" implies JSON-only and if we are looking to get away from that, to me it looks like rewriting the related JSON-specific drafts/specs and figuring out a way to replace them with a language-agnostic equivalent is the best solution. But I do think we could start that in a much simpler way by using the JSON-specific drafts/specs as a base and using a convention to flesh out what we might eventually bubble up to a draft/spec.

whitlockjc avatar Sep 16 '22 14:09 whitlockjc

But the structure and syntax of these items are by no means tied to just JSON and there is no reason we can't use what's there as-is regardless of the syntax of the containing/referrant documents

Reading the specification with no in-depth knowledge, I simply fail to see how this can be interpreted from the standard, do you mind clarifying how and where you read this? :sweat_smile: There are numerous places within the standard that explicitly state it has to be JSON, RFC4627, not an arbitrary structure that is not tied to JSON. Abstract defines JSON value to reference another value in a JSON document, section 3: which identifies the location of the JSON value being referenced, etc. Nowhere within that does it state anything about non-JSON referenced resources.

With a bit of creative freedom, I might agree with you and especially since you have such an in-depth knowledge of it, that you can rather easily simply replace JSON with any structure and it will work in most cases. But from the standards perspective and for someone who cannot read it from the standard I would argue that the current one is not sufficient.

But I do think we could start that in a much simpler way by using the JSON-specific drafts/specs as a base and using a convention to flesh out what we might eventually bubble up to a draft/spec.

In my eyes we are, but I might be misunderstanding the term. This standard is nothing more than JSON Reference turned agnostic, which takes care of the first issue with JSON reference not (in my words) allowing referring to non-JSON resources. It does not specifically define how to handle complex fragments but leaves this up to the standard that incorporates it.

I.e. fragments are something the AsyncAPI spec (or any other) needs to specifically define, see https://github.com/asyncapi/spec/pull/825#discussion_r970658915.

Regarding resolvement/conversion of non-JSON structures to JSON I simply defined it as having to be placed into a string. Still not sure about this approach and why it's part of the remaining issues to figure out.

I fail to see where we can simplify this any further 🧐 Do you have any suggestions?

jonaslagoni avatar Sep 16 '22 15:09 jonaslagoni

In JSON terms, a JSON Reference is an Object that has a singular $ref key. Every programming language has the ability to represent this. That's the reason why I say that while JSON Reference is JSON-specific, its concepts are not. I could easily represent a JSON Reference in any programming language, either using their Object equivalent (Java Maps, Python dicts, Golang maps, ...) or by using their type system (Java classes, Python classes, Golang structs, ...). There is a reason why most languages have built-in support for serialization to/from JSON to native language structures, because at the end of the day JSON is nothing more than a structured document with a simple type system that is represented in all programming languages.

So if the overall structure of a JSON Reference isn't necessarily unique or specific to JSON, I stand by what I said.

In JSON terms, a JSON Pointer is a String containing "containing a sequence of zero or more reference tokens, each prefixed by a '/' (%x2F) character." Again, there is nothing about this that is JSON-specific in its syntax nor in how resolution works because what makes a JSON Pointer resolvable is the fact that the JSON document is itself a structured document. You could easily use a JSON Pointer to resolve anything in a hierarchical data structure.

So if a JSON Pointer is really just a collection of "reference tokens" in a language agnostic format, and the requirement of resolution is a hierarchical data structure, neither of these requirements are unique or specific to JSON and I stand by what I said.

At the end of the day, JSON References provide a language agnostic way of defining a reference, despite it having "JSON" in its name. And JSON Pointers provide a language agnostic way of defining the hierarchical collection of reference tokens needed to resolve data in a hierarchical data structure, despite it having "JSON" in its name.

whitlockjc avatar Sep 16 '22 15:09 whitlockjc

I've tried to read all the comments and understand them 🤯, but I have some problems with the purpose of the PR itself, because we should specify exactly that this is about the ability to reference the schemas themselves (mainly for message payloads, but we should also allow this for other places where we can use SchemaObject in the AsyncAPI document - include extensions and bindings) not to enable referencing for example Info Object defined in XML (or something like that).

Additionally, it has been mentioned a lot here that the JSON Pointer itself is JSON agnostic and could be used for data that does not have a typical JSON structure (but it is possible to transform it to JSON) and then the JSON Pointer could actually be used to pull nested data, well here is the problem because every other structure format (and even JSON-based formats like AVRO) can have (and usually have) a different way of referencing against each other than defined like a JSON Pointer.

Another thing, even if we think of a fancy way of referencing other formats and even give it an RFC number, the problem remains: how is this supposed to be supported by the tools themselves? For now, there are no tools that can reference formats other than JSON/Yaml (by JSON Pointer) - XML (whole document from given file) to JSON referencing alone is currently supported in most JSON dereferencing tools and is treated as a plain string, but the problem is nested elements in XML (no mention of attributes, namespaces), so @jessemenning gave a good comment here, because what do we get from spec for referencing as there will be no tools to support it - I don't want to come off as ignorant but is there any implementation (in any language, at least one) that supports 100% all referencing cases (in dereferencing phase) in the new OpenAPI 3.1? If not, why do I need such a possibilities of referencing? Of course, I am also developer and I know that it can be difficult, but at least one tool implementation should be written alongside with given spec or draft for spec to show that's possible.

So if we want to somehow incorporate referencing other formats "inside" JSON Pointer, it would have to be strictly defined and explained without relying on any existing tools like XML -> JSON because they don't have to be based on the specification at all but on the author's idea - that is, each tool == a different way of transforming XML to JSON.

Simple example with Avro:

{
     "type": "record",
     "namespace": "com.example",
     "name": "FullName",
     "fields": [
       { "name": "first", "type": "string" },
       { "name": "last", "type": "string" }
     ]
} 

and then (inside another Avro schema) I can reference FullName type as com.example.FullName. I could write by JSON Pointer: ./avro.avsc#/com.example.FullName but probably I need to know (and then tool) how to handle that and com.example.FullName is not a property of object (root object) but reference to the given (usually nested) element in some structure.

If it were up to me, instead of combining JSON Pointer with other formats, I would prefer a solution like this:

  • If $ref's values is a string, it's JSON Pointer
  • if $ref's value is a JSON Object, then link field inside that object (or another name, it's only a suggestion) points to some structure of another format (with JSON Pointer syntax/format) and rest of fields (depends on format) define complex referencing.

Example:

$ref:
  link: './avro.avsc'
  format: ...avro
  type: com.example.FullName

As I wrote at the beginning of the post, I tried to understand all the previous comments, but I missed a little bit of such a clue why we are discussing here because we are focusing too much on unimportant things so far, and probably not enough on how it would work in the end and what problems would face the developers of the tools. If I talk nonsense then you can forget about my comment.

BTW. Fran mentioned during the discussion of my one proposal about the possibility of defining custom schema formats anywhere in the spec, that probably this possibility is not so important, because anyone can parse XML to JSON and then use a regular JSON Pointer, but it requires a lot of user knowledge and work and we should simplify life.

EDIT: What I see in PR's content, Jonas tries to do something similar like my changed $ref but we should conclude how to handled that "fragment content of JSON Pointer".

magicmatatjahu avatar Sep 16 '22 18:09 magicmatatjahu

@magicmatatjahu

is there any implementation (in any language, at least one) that supports 100% all referencing cases (in dereferencing phase) in the new OpenAPI 3.1?

Yes, I believe @karenetheridge's OpenAPI::Modern handles all of the cases, using her JSON::Schema::Modern implementation. I think @gregsdennis JSON Everything project either supports it or will soon (he and I talked about base URIs in OAS 3.1 files recently, at least). @jdesrosiers's Hyperjump OAS validator seems to work with the schema objects in isolation, but I'm pretty sure you could use Hyperjump JSON Schema Core to handle references across an OAS 3.1 file as it understands all of the necessary base URI and reference resolution concepts.

These are just projects by people I know, and come more out of the JSON Schema side expanding to OAS-specific validation. You may be able to find more for other OAS use cases at openapi.tools.

handrews avatar Sep 16 '22 19:09 handrews

Since variations have been brought up by many people in many threads, I'm going to make this plea at the top level:

Please, please, please do not make up new $ref behavior or syntax!

Just use a different keyword name.

Making your own $ref will create an absolute nightmare for us over at the JSON Schema Org as people who learn your $ref show up and demand that we support whatever you did.


JSON Reference was one of the JSON Schema Org's draft standards and we intentionally dropped it, while understanding that some would continue to use the old approach as it was. That's fine. The OpenAPI/AsyncAPI Reference Objects have a well-known (even if long-expired) specification source, and aside from the "ignore adjacent properties" (which has always been the cause of tremendous confusion anyway), the behaviors are compatible.

But as soon as you go beyond that, you are fracturing the specification landscape. For $ref in AsyncAPI files outside of JSON Schema objects, you'll be creating a divergent expectation and we'll get demands to comply, or to reconcile the approaches, or something.

If you make changes to $ref in the context of JSON Schema objects, then you're just flat-out incompatible with JSON Schema. I've lost track of whether this is part of the current proposal, but just think of all of the problems and frustrations caused by OAS 2 and OAS 3.0's JSON Schema incompatibilities. It was enough of a problem that JSON Schema compatibility became the primary focus of OAS 3.1 (along with Webhooks).

So please, do not fracture the $ref landscape. Old-style JSON References are fine. Once you go beyond that in a way that is not compatible with JSON Schema's $ref, we are all going to end up hurting.

There are many names out there. If you want it to work in JSON Schema, we'd request that you not use a $ prefix as we use that to signify the small set of JSON Schema core keywords (although using $ is not outright forbidden, and I can't imagine anyone would do anything about it if you do use it). If it is being used outside of JSON Schema, we of course don't care what you use in other contexts.


@jonaslagoni regarding:

***This means that if AsyncAPI takes a similar approach to modern JSON Schema, then you can reference other schemas as the behavior is defined in terms of the schema result, not the schema syntax. ***

Do you mind clearifying this, I dont quite understand what you mean here by the schema result, not the schema syntax 🤔?

JSON Reference (and JSON Schema $ref in draft-03, -06, and -07) works by replacing the containing object with the target, which is why adjacent (same-object) keywords are ignored.

In JSON Schema, keywords have two effects: they produce a boolean assertion result, and they can produce annotations, e.g. associating a keyword like title with a specific instance location. A schema object ANDs its assertion results, and if that ANDed result is true it keeps the annotations from all of the keywords. If it's false, it deletes the annotations. Don't worry too much about the annotation stuff, I'm just pointing out that there's more going on than a boolean.

With the old "replace the context object" approach, you could implement it as a literal replace-and-re-evaluate, or you could implement it as "evaluate $ref's target schema, and use its results for the whole schema object containing the $ref, ignoring all other keywords instead of evaluating them, ANDing their results, etc." The effect is the same either way.

In modern JSON Schema (2019-09 and later), we started from the 2nd approach and changed it to the following (italicized parts are different): "evaluate $ref's target schema, and use its results as the result of $ref as a JSON Schema keyword." From there, the standard way of combining keyword results in JSON Schema is used, which means that adjacent keywords work just fine. So consider:

{
  "$defs" :{
    "foo": {
      "type": "object",
      "additionalProperties": {"type": "string"}
    }
  },
  "type": "array",
  "items": {
    "$ref": "#/$defs/foo",
    "minProperties": 1
  }
}

In the old JSON Reference replace-the-context-object approach, this behaves like:

{
  "type": "array",
  "items": {
    "type": "object",
    "additionalProperties": {"type": "string"}
  }
}

In the 2019-09 and later approach, it behaves more-or-less like:

{
  "type": "array",
  "items": {
    "allOf": [{
      "type": "object",
      "additionalProperties": {"type": "string"}
    }],
    "minProperties": 1
  }
}

There's not really an allOf inserted, but a one-element allOf also just evaluates that element and uses its result as the result of the allOf keyword. A $ref now does the same thing: evaluates the referenced schema and uses tis result as the result of $ref as a keyword.

So, for other uses of $ref, the quesetion is, "What is the effect of the thing that is being referenced, and how would it be used at the point of reference. In JSON Schema, the thing is a schema and it is used for its assertion and annotation results. JSON Schema has a well-defined process for combining the assertion and annotation results of multiple keywords, so that works just fine.

How well a similar approach would work in other contexts depends on what you're referencing and how it's used.

handrews avatar Sep 16 '22 19:09 handrews

Please, please, please do not make up new $ref behavior or syntax!

Agreed, but didn't JSON Schema do the same? While JSON Schema's syntax is the same as JSON Reference, its resolution behavior is definitely very different than the $ref we all know and love. So when people see a $ref in AsyncAPI, who's to know what the expectation for resolution might be. I know if I were writing OpenAPI tooling again, something I've not been excited to do since the unnecessary complexity of 3.x, the only choice is to defer to an approved JSON Schema parser for handling $ref because it seems to be specific to JSON Schema now.

And that's why I go back to what exactly are we trying to solve here? If you have structured data and a JSON Reference, resolution is possible. While "JSON" is implied by their names, JSON Pointers/References provide a simple syntax and expectation for defining resolution locations and resolution rules...even if you're not using JSON. (JSON References seem to work just fine in YAML for example.)

Not looking to rock the boat, but as a long time OpenAPI consumer and tooling author, I'd just hate to see us go down the path of making things harder than they need to be. I'm completely open to changing my mind, I'm just not sure there's been a case made that can't be solved with the existing JSON Reference syntax and resolution rules. Maybe I just missed it above, so I'll go back and re-read again.

whitlockjc avatar Sep 16 '22 20:09 whitlockjc

Agreed, but didn't JSON Schema do the same?

They're all JSON Schema Org specs (including JSON Reference which, like all of the early JSON Schema specs, was edited/authored or co-edited/authored by Kris Zyp), and all clearly marked as drafts. And we talked it over extensively with OpenAPI as you probably recall since you were on a lot of those TSC calls. And there were and are a ton of use cases where keywords alongside $ref would be advantageous. And someone (either Phil Sturgeon or someone from the OpenAPI TSC or some combination) went and did an extensive survey of $ref usage to see if making that change was likely to cause problems in OpenAPI. It's not like we just did it for the lulz without talking to anyone about it.

It's different when one org redefines another org's specs. We don't redefine JSON Pointer, for example (although we do define Relative JSON Pointer, which is careful not to overlap or contradict JSON Pointer: they are distinct specifications).

its resolution behavior is definitely very different than the $ref we all know and love.

The constant stream of people confused about adjacent keywords being ignored and wanting them not ignored indicated that "love" was definitely a stretch. We changed the behavior in response to use cases and confusion within the community.

handrews avatar Sep 16 '22 21:09 handrews

... It's not like we just did it for the lulz without talking to anyone about it.

First off, there was no disrespect in my message and I never suggested that the decisions made were taken lightly. I do remember being on those OpenAPI TSC calls, but I also remember being one of the people suggesting against the change. My reasons were to avoid confusion for people that know $ref, which should include years of people using JSON Schema/OpenAPI/Swagger, and to avoid complexity as it relates to reasoning about the reference by a human and when authoring tooling. What's done is done and there is no going back, but my concern that seeing a $ref and not knowing if it's the draft spec to follow or the JSON Schema specific version seems like a valid concern.

To move on from my original disagreement with the changes to $ref, because they don't matter now, my new opinion is that OpenAPI and AsyncAPI have similar needs and since OpenAPI adopted the JSON Schema $ref approach, I would be all for AsyncAPI doing the same. It would make for consistency across authoring these documents, and the tooling for supporting them.

But even if AsyncAPI does this, it does not change my opinion on how $ref is capable of resolving values in any structured data so I don't see the need to extend/modify what JSON Schema already does for $ref as it relates to supporting resolution of values in non-JSON referrant documents.

whitlockjc avatar Sep 16 '22 21:09 whitlockjc

Thanks, @whitlockjc .

a $ref and not knowing if it's the draft spec to follow or the JSON Schema specific version seems like a valid concern.

Not to re-litigate but just to explain for those with less context, yes that's a valid concern. But for Reference Objects (which are in OAS 3.1 clearly distinct from Schema Objects- they are never both syntactically valid in the same place) both OpenAPI and AsyncAPI say explicitly in their own documentation that adjacent fields SHALL be ignored. With that restriction, you end up with essentially the same behavior in both places (a $ref in modern JSON Schema that doesn't have siblings has the same observable behavior as it did in draft-07).

OpenAPI and AsyncAPI have similar needs and since OpenAPI adopted the JSON Schema $ref approach, I would be all for AsyncAPI doing the same. It would make for consistency across authoring these documents, and the tooling for supporting them.

I'm in agreement with you there.

But even if AsyncAPI does this, it does not change my opinion on how $ref is capable of resolving values in any structured data so I don't see the need to extend/modify what JSON Schema already does for $ref as it relates to supporting resolution of values in non-JSON referrant documents.

The problem isn't the JSON Pointer part, it's the URI fragment part. All of this boils down to whether one wants to respect the principle that media types determine fragment syntax, and if so how one goes about reconciling that with a varied landscape of data formats, some of which don't define media types, and some of which that do don't define suitable fragment approaches.

For anyone wondering where I'm getting this, it's from RFC 3986 §3.5:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained. Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Individual media types may define their own restrictions on or structures within the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found).

That's only a $ref question because $ref uses a single URI, and is therefore governed by these rules. A reference that doesn't use URIs, or (more realistically) uses URIs plus additional information not governed by RFC 3986, has more flexibility.

For JSON Schema, $ref makes sense because we know the target media type (application/schema+json), so URI fragments have clear predictable behavior.

handrews avatar Sep 16 '22 22:09 handrews

A quick update: I have continued to think on the issues brought up here in @jonaslagoni 's proposal, plus the conversation with @whitlockjc and others, including the long-standing use of standalone ref-parsing tools outside of JSON Schema. We are discussing this among the JSON Schema team.

It will be a little while before I can share more details due to a team member being OOTO until later this week, but I wanted to let y'all know that my "please don't redefine $ref because JSON Schema owns it" was not intended to be a "don't do this, that's the end of it" ultimatum. We recognize that if we're going to claim some sort of ownership over $ref, then we have a responsibility to the communities using it. Whether that usage fits perfectly with ours or not.

These sorts of issues have come up periodically, and we're assessing the various proposals that have been considered in the past. I post an update here when I am able, whether that's a new proposal or an acknowledgement that we should work with an existing proposal such as this one (I'm not aware of any other recent proposals, although if anyone knows of any please comment and I'll throw them into the mix).

I apologize for being cryptic, it's simply a matter of timing with respect to our team members' schedules.

handrews avatar Oct 02 '22 22:10 handrews

A few questions for clarifying requirements. By secondary resource, I more-or-less mean "thing you'd use a fragment to reference if fragment behavior were defined for the media type."

  • Do you ever need to reference a schema without knowing its format in advance?
  • Is there any other need for a single syntax (JSON Pointer) for accessing secondary resources across all schema formats?
  • Do all known schema formats define some sort of secondary resource access syntax (whether as fragment syntax or otherwise)?
  • For any schema formats that do not define their own secondary resource access syntax, is there a need to access secondary resources in that format?
  • When referencing a format that does not itself use JSON Reference, are there other sorts of references that need to be resolved? If so, do they use URIs? In particular, do they allow relative URI references?
  • Are cyclic references possible outside of JSON Schema?
  • Are all of AsyncAPI's $ref uses outside of JSON Schema intended to be "resolved" in the sense of replacing them in order to create a single file that does not use $ref outside of JSON Schema?

handrews avatar Oct 03 '22 00:10 handrews

Sorry for taking so long folks, had to figure out what we should do with all this information and how to further progress the discussion. Because there are so many things to consider, and use-cases that almost becomes impossible to follow in a PR Therefore I created a dedicated on ONLY focusing on a tooling perspective and how the specifications are sufficient/insufficient in defining expected behavior in order to create a feedback loop to create better standards and tooling. Please see https://github.com/asyncapi/community/discussions/485

So if we want to somehow incorporate referencing other formats "inside" JSON Pointer, it would have to be strictly defined and explained without relying on any existing tools like XML -> JSON because they don't have to be based on the specification at all but on the author's idea - that is, each tool == a different way of transforming XML to JSON.

@magicmatatjahu defined in U8 👍

If it were up to me, instead of combining JSON Pointer with other formats, I would prefer a solution like this:

@magicmatatjahu I am neither against nor for changing the keywords and structure for linking or referencing for that matter, lets see how we can progress that!

But even if AsyncAPI does this, it does not change my opinion on how $ref is capable of resolving values in any structured data so I don't see the need to extend/modify what JSON Schema already does for $ref as it relates to supporting resolution of values in non-JSON referrant documents.

@whitlockjc please jump into the discussion in U8 in the discussion, especially as you have the reasons why you see linking to non-JSON resources should be easy to achieve. But remember to point out where exactly in the specifications this is clarified or interpreted!

https://github.com/asyncapi/community/discussions/485#discussioncomment-3788951

The problem isn't the JSON Pointer part, it's the URI fragment part. All of this boils down to whether one wants to respect the principle that media types determine fragment syntax, and if so how one goes about reconciling that with a varied landscape of data formats, some of which don't define media types, and some of which that do don't define suitable fragment approaches.

@handrews U9 and U10 might be just for you! This tries to clarify the problem with media types and remote references and media types.

A quick update: I have continued to think on the issues brought up here in @jonaslagoni 's proposal, plus the conversation with @whitlockjc and others, including the long-standing use of standalone ref-parsing tools outside of JSON Schema. We are discussing this among the JSON Schema team.

@handrews I hope that the discussion and use cases can help us narrow down exactly where tooling doesn't know what to do because the standards are insufficient and the problems they face. Hope this helps your proposal to narrow it down.

A few questions for clarifying requirements. By secondary resource, I more-or-less mean "thing you'd use a fragment to reference if fragment behavior were defined for the media type."

Damn, what a set of questions 😅 Let me see if I can add my perspective to some of them!

Do you ever need to reference a schema without knowing its format in advance?

I think it could be, see U5, U9, and U10.

Is there any other need for a single syntax (JSON Pointer) for accessing secondary resources across all schema formats?

I would say that would be preferred, otherwise, implementations can come to the wrong conclusion of what that second resource is, or not even be able to find it.

Do all known schema formats define some sort of secondary resource access syntax (whether as fragment syntax or otherwise)?

I would say no here. But I dont have a complete experience with all the standards so I cant possibly say tbh. Maybe https://github.com/asyncapi/spec/issues/622 can give you some insight.

For any schema formats that do not define their own secondary resource access syntax, is there a need to access secondary resources in that format?

I would say yes because of https://github.com/asyncapi/spec/issues/622.

When referencing a format that does not itself use JSON Reference, are there other sorts of references that need to be resolved? If so, do they use URIs? In particular, do they allow relative URI references?

If you take a look at JTD, it uses its own referencing format i.e. simple mapping between reference and definitions. See https://www.rfc-editor.org/rfc/rfc8927#name-ref

Are cyclic references possible outside of JSON Schema?

Yes, but of course not for all formats.

Are all of AsyncAPI's $ref uses outside of JSON Schema intended to be "resolved" in the sense of replacing them in order to create a single file that does not use $ref outside of JSON Schema?

It is only from a tooling perspective. Because, as far as we can determine, the use-case is not to know it's a reference but what that referenced resource is, so you can iterate/interact with the resolved resourced. Especially when it comes to parsers.

jonaslagoni avatar Oct 03 '22 17:10 jonaslagoni

This pull request has been automatically marked as stale because it has not had recent activity :sleeping:

It will be closed in 120 days if no further activity occurs. To unstale this pull request, add a comment with detailed explanation.

There can be many reasons why some specific pull request has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this pull request forward. Connect with us through one of many communication channels we established here.

Thank you for your patience :heart:

github-actions[bot] avatar Feb 01 '23 00:02 github-actions[bot]