spec Proposal for Unparsable Remote References (mainly to support XML.)

Solace has customers who love AsyncAPI but use XML payloads in their messaging systems. Also, the question of whether things like XSD schemas are supported has come up more than once in the Slack channels.

We would like to propose the notion of an Unparsable Remote Reference. These would be, at minimum, URLs represented by simple strings. By Unparsable we mean that in general, AsyncAPI parsers would not be expected to retrieve and/or parse the entities pointed to by these references. Code generators, on the other hand, could use these references.

The use case we are trying to solve immediately is how to provide a URL to an XSD schema, so that a code generator could created a model class from the schema and use it with XML libraries for serializing messages.

One simple way to do this is one that requires no change to the specification nor to the parser. A message could look like this:

messages:
    myXmlMessage:
      payload:
        remoteReference: "https://example.com/myschema.xsd"
      contentType: "application/xml"

(This fragment works fine with the current parser, you can try it in the playground.)

When this message is passed back from the parser, the payload contains an anonymous schema containing the field remoteReference with its value.

An improvement would be to create a parser plugin (similar to the avro parser. That would allow us to also specify the schemaFormat (currently the parser will fail if you try to set the schemaFormat to application/xml - that won't work now because there is no schema parser defined for that format.)

Yet another improvement would be to allow an object representing a schema registry, in cases where it would be desirable to add more fields besides just a URL.

This mechanism would also allow users to use Avro files in their original form (the current avro parser translates to JSON schema), and it could also be used to support protobuf or any other kind of schema.

The name remoteReference was intended to be general enough to be applied to other use cases, not just non-JSON schemas.

Ideally it would be nice to have a standard, documented way to do this.

Sep 10 '21 22:09 damaru-inc

I think this proposal addresses, in a generic way, an issue that will likely continue to re-emerge.

OpenAPI naturally uses JSON due to its focus on synchronous RESTful interactions. The async world is much more diverse in both protocols and data formats. Data formats include things like Avro, Protobuf, XML, EDI, and the inevitable "cool new format" that will emerge next year. It feels like we need an extensible way to accommodate that diversity without needing to explicitly include it in the spec.

Longer term, we may need to have something like a "format binding" (a parallel to protocol bindings.) The format binding would that provides format specific fields. For example, an XML format binding could include a namespace. But that seems like a lengthy, heavy lift, and @damaru-inc proposal seems like a good first step that will get early adopters off the ground.

Also, here is an example of what a protobuf implementation would look like:

messages:
  myProtobufMessage:
    remoteReference: "https://example.com/myschema.proto"
  contentType: "application/x-protobuf"

Sep 16 '21 12:09 jessemenning

Interesting concept. Maybe it would be enough to add a flag which would specify whether a given schema should be parsed/formatted or not. We could extend my proposal to define schemas in other formats in different places (currently it is possible only in message's payload) - https://github.com/asyncapi/spec/issues/622 and add to Schema Object a parse field:

messages:
    myXmlMessage:
      payload:
        parse: false
        schema:
          $ref: "https://example.com/myschema.xsd"
      contentType: "application/xml"

which would mean that it should not be parsed and transformed to JSON, but only resolved/fetched.

We can also use the remote: true field.

remoteReference isn't good solution for pointing to the remote source which should be fetched, because then link to this source must be handled by tool to fetch that, and the $ref is standarized to this purpose. We shouldn't reinvent wheel from scratch.

Sep 16 '21 13:09 magicmatatjahu

@jmenning-solace Could you create issue about (as you described it) format binding? It's very interesting idea and we should not forget it :)

Sep 16 '21 15:09 magicmatatjahu

Currently we have many objects defined in XSD. These historical objects can be repurposed as event attributes. This unlocks tremendous value(financial, technical) if re-use can be accomplished. Looking forward for this discussion and feature availability.

Sep 17 '21 04:09 assets-cg

remoteReference isn't good solution for pointing to the remote source which should be fetched, because then link to this source must be handled by tool to fetch that, and the $ref is standarized to this purpose. We shouldn't reinvent wheel from scratch.

@magicmatatjahu, my concern is this would create valid AsyncAPI documents that are invalid JSON documents. $ref is defined within JSON schema as a reference to another JSON schema, not an adhoc schema type. Standard JSON parsers are coded to the spec. When standardized JSON/JSON schema parsers encounter the non-JSON schema (protobuf, XML, COBOL copybook, etc.) they will either ungracefully fail or return unpredictable results.

So where do we go from there? I see two options (would love to hear others):

Custom code the AsyncAPI specific parser. But the divergence between OpenAPI and JSON Schema was so painful it took years to reconcile them. And if there are standard JSON parsers baked into products like databases, we will never be able to apply workarounds to them.
Use something like remoteReference This keeps AsyncAPI documents as valid JSON documents. Standard parsers will not throw up when they encounter them, and just treat the URL as a string. While not ideal, this seems like a reasonable fall back. But when used in an "AsyncAPI aware" parser, it can import the full body of the external non-JSON schema into the main AsyncAPI document as a string that can be parsed by code generators, etc..

My preference would be the second.

We could extend my proposal to define schemas in other formats in different places (currently it is possible only in message's payload)

This is an interesting concept, and maybe that's the appropriate place for the payload binding. Let me think more on that. It seems extendable to these use cases, with the caution that Avro is an easier format to deal with because it's JSON.

Sep 17 '21 13:09 jessemenning

Currently we have many objects defined in XSD. These historical objects can be repurposed as event attributes. This unlocks tremendous value(financial, technical) if re-use can be accomplished. Looking forward for this discussion and feature availability.

Thanks for chiming in @masterhead , it's nice to get an end user perspective. Can I ask you a couple questions about your use case?

Do you need the full text of the XML schema imported into the AsyncAPI spec, or is simply having a URL pointer to it sufficient?
Do your XML documents typically have a single root element? Or do they have multiple root elements?

Sep 17 '21 13:09 jessemenning

@jmenning-solace

my concern is this would create valid AsyncAPI documents that are invalid JSON documents. $ref is defined within JSON schema as a reference to another JSON schema, not an adhoc schema type. Standard JSON parsers are coded to the spec. When standardized JSON/JSON schema parsers encounter the non-JSON schema (protobuf, XML, COBOL copybook, etc.) they will either ungracefully fail or return unpredictable results.

Do you know, that $ref doesn't have to point to a valid JSON schema? Also string is also valid JSON schema, not in the sense of validation, but of value (JSON spec treats string value as normal JSON instance 😄 ), so for example:

// some_xsd.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"
           elementFormDefault="qualified">
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>
 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>
   <xsd:element name="BillTo" type="tns:USAddress"/>
  </xsd:sequence>
  <xsd:attribute name="OrderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:integer"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
 </xsd:complexType>
</xsd:schema>

# asyncapi.yaml
asyncapi: '2.1.0'
info:
  title: Account Service
  version: 1.0.0
  description: This service is in charge of processing user signups

channels:
  user/signedup:
    subscribe:
      message:
        $ref: '#/components/messages/UserSignedUp'

components:
  messages:
    UserSignedUp:
      payload:
        type: object
        properties:
          displayName:
            type: string
            description: Name of the user
          email:
            type: string
            format: email
            description: Email of the user
        someCustomProp:
          $ref: ./some_xsd.xsd

and then after dereferencing I have:

{
  "asyncapi": "2.1.0",
  "info": {
    "title": "Account Service",
    "version": "1.0.0",
    "description": "This service is in charge of processing user signups"
  },
  ...
  "components": {
    "messages": {
      "UserSignedUp": {
        "payload": {
          "type": "object",
          "properties": {
            "displayName": {
              "type": "string",
              "description": "Name of the user",
              "x-parser-schema-id": "<anonymous-schema-2>"
            },
            "email": {
              "type": "string",
              "format": "email",
              "description": "Email of the user",
              "x-parser-schema-id": "<anonymous-schema-3>"
            }
          },
          "someCustomProp": "<xsd:schema xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:tns=\"http://tempuri.org/PurchaseOrderSchema.xsd\" targetNamespace=\"http://tempuri.org/PurchaseOrderSchema.xsd\" elementFormDefault=\"qualified\"> <xsd:element name=\"PurchaseOrder\" type=\"tns:PurchaseOrderType\"/> <xsd:complexType name=\"PurchaseOrderType\"> <xsd:sequence> <xsd:element name=\"ShipTo\" type=\"tns:USAddress\" maxOccurs=\"2\"/> <xsd:element name=\"BillTo\" type=\"tns:USAddress\"/> </xsd:sequence> <xsd:attribute name=\"OrderDate\" type=\"xsd:date\"/> </xsd:complexType>\n<xsd:complexType name=\"USAddress\"> <xsd:sequence> <xsd:element name=\"name\"   type=\"xsd:string\"/> <xsd:element name=\"street\" type=\"xsd:string\"/> <xsd:element name=\"city\"   type=\"xsd:string\"/> <xsd:element name=\"state\"  type=\"xsd:string\"/> <xsd:element name=\"zip\"    type=\"xsd:integer\"/> </xsd:sequence> <xsd:attribute name=\"country\" type=\"xsd:NMTOKEN\" fixed=\"US\"/> </xsd:complexType> </xsd:schema>",
          "x-parser-schema-id": "<anonymous-schema-1>"
        },
        ...
      }
    }
  },
  "x-parser-spec-parsed": true
}

So the dereferencer even fetches something from the web it still treats it as a string value, and only if it's valid JSON, i.e. a value starting with { then it treats it as JSON, otherwise it parses it as a string. This is how it works in JS (and also in our ParserJS), I don't know about other languages, but it should be similar because JSON is standarized for a long time.

If you are talking about this case with making references that should not be fetched, there is now a possibility to use e.g. an extension for this case:

messages:
    myXmlMessage:
       $ref: "https://example.com/myschema.xsd" # it will be fetched and treated as string value
      contentType: "application/xml"
      x-remote-ref: "https://example.com/myschema.xsd" # point to this reference that can be used in generators

Another possibility is to add each $ref before resolving to the schema/(part of document) as x-parser-original-ref and then you have the value of the reference and the link to it.

Sep 18 '21 11:09 magicmatatjahu

Currently we have many objects defined in XSD. These historical objects can be repurposed as event attributes. This unlocks tremendous value(financial, technical) if re-use can be accomplished. Looking forward for this discussion and feature availability.

Thanks for chiming in @masterhead , it's nice to get an end user perspective. Can I ask you a couple questions about your use case?

Do you need the full text of the XML schema imported into the AsyncAPI spec, or is simply having a URL pointer to it sufficient?

We would not want full text of XML schema in the AsyncAPI file. More on the lines of remote pointer, where our tools can parse to provide how a sample payload look for user understanding. If it is remotely hosted on URL https://domain/ssss/schema_def it works good. But some times the URL may be relative path to AsyncAPI file too ) eg Root Folder AsyncApi File SomeSchemaDef.xsd refence pointer to be "./SomeSchemaDef.xsd" or on similar lines

Do your XML documents typically have a single root element? Or do they have multiple root elements?

For historical reasons, currently majority of objects are definitions are already in the WSDL which have to liberated and defined as events. It would be ideal if spec can support XSD for any combination of multiple objects. (Could re-use existing infrastructure of already defined XSD as-is instead of manually editing and creating xsd of each event object)

Independent objects : Obj-1 has no relation ship with Obj-2
Dependent objects : Obj-2 , Obj-3 has parent of Obj-1 (acyclic graphs/trees)

Sep 21 '21 04:09 assets-cg

Do you know, that $ref doesn't have to point to a valid JSON schema? Also string is also valid JSON schema, not in the sense of validation, but of value (JSON spec treats string value as normal JSON instance 😄 ), so for example:

I tried your example, and it does work. My concern here isn't so much with your proposal, but with the fact that at least our parser-js treats $ref differently depending on whether it's under message/payload or above it.

And what should we call 'someCustomProp?' At least with remoteReference, we can put that somewhere like message/payload and then its purpose becomes clear, and that also gives us a standard name for a property that we can use elsewhere.

Sep 21 '21 13:09 damaru-inc

@damaru-inc

I tried your example, and it does work. My concern here isn't so much with your proposal, but with the fact that at least our parser-js treats $ref differently depending on whether it's under message/payload or above it.

Most probably you mean the situation when you make a reference to the schema, e.g. xml, but then get an error from the parser that it can't parse that? Here I have to tell you that our parser doesn't treat $ref differently. The $ref is used only to de-reference the given reference and replace it with a value. Only after that comes the validation phase and parsing against the schema format, so our parser doesn't change/adjust the logic of $ref.

And what should we call 'someCustomProp?' At least with remoteReference, we can put that somewhere like message/payload and then its purpose becomes clear, and that also gives us a standard name for a property that we can use elsewhere.

If one only needs a reference and not a value from a reference then remoteRef is ok, but I would prefer to be able to reuse in this use case the $ref so that later after parsing you still have the reference.

Sep 21 '21 14:09 magicmatatjahu

Most probably you mean the situation when you make a reference to the schema, e.g. xml, but then get an error from the parser that it can't parse that? Here I have to tell you that our parser doesn't treat $ref differently. The $ref is used only to de-reference the given reference and replace it with a value. Only after that comes the validation phase and parsing against the schema format, so our parser doesn't change/adjust the logic of $ref. But I guess again (because I haven't yet studied the source code: clearly I should) : our parser-js doesn't throw errors every time it sees a $ref probably because there are only a few situations where it needs to interpret the reference as a JSON Schema. If it's in a place in the parse tree where it doesn't care, it just returns the string (contents of the file), right?

If one only needs a reference and not a value from a reference then remoteRef is ok, but I would prefer to be able to reuse in this use case the $ref so that later after parsing you still have the reference.

I agree that we need a way to keep the reference. The current parser always attaches its own internal schema-id to anything that parses as a schema, e.g.

 'x-parser-schema-id': '<anonymous-schema-1>'

so that would be the logical place to put whatever kind of schema id we want.

cheers Michael

Sep 22 '21 03:09 damaru-inc

Hey @magicmatatjahu, I had a simple NodeJS application using the json-schema-ref-parser used in the asyncapi parser.js to test out cml parsing. The parser threw an error when it encountered an xml schema file. Since the $ref uses json-schema-ref-parser as the $RefParser wouldnt it be problematic to assume that it'll handle it any non JSON format as a string? This is what I did for a quick local test

const $RefParser = require("@apidevtools/json-schema-ref-parser");

const myJSON = "./myJSON.json";

const myXML = "./sampleXML.xsd";

$RefParser.dereference(myJSON, (err, schema) => {
  if (err) {
    console.error(err);
  } else {
    console.log(schema);
  }
});

$RefParser.dereference(myXML, (err, schema) => {
  if (err) {
    console.error(err);
  } else {
    console.log(schema);
  }
});

{
  stack: 'SyntaxError: "/Users/taltamimi/hacks/json-schema-parser-test/sampleXML.xsd" is not a valid JSON Schema\n' +
    '    at $RefParser.parse (/Users/taltamimi/hacks/json-schema-parser-test/node_modules/@apidevtools/json-schema-ref-parser/lib/index.js:131:17)\n' +
    '    at async $RefParser.resolve (/Users/taltamimi/hacks/json-schema-parser-test/node_modules/@apidevtools/json-schema-ref-parser/lib/index.js:184:5)\n' +
    '    at async $RefParser.dereference (/Users/taltamimi/hacks/json-schema-parser-test/node_modules/@apidevtools/json-schema-ref-parser/lib/index.js:268:5)',
  message: '"/Users/taltamimi/hacks/json-schema-parser-test/sampleXML.xsd" is not a valid JSON Schema',
  toJSON: [Function: toJSON],
  name: 'SyntaxError',
  toString: [Function: toString]
}

sampleXML: https://gist.githubusercontent.com/TamimiGitHub/4bfcd8e553b83c86a0f7fb65a9a23726/raw/ce9b574437627a4c56a1c7924f32fed6d28db85e/sampleXML.xsd

What are your thoughts on this?

Sep 29 '21 16:09 TamimiGitHub

@TamimiGitHub Hi! You have error, because you try pass the non JSON as argument to the $RefParser.dereference function what isn't supported. The root object for dereference must be JSON. What I meant in my comment about reference to non JSON schemas, that it works when you have referenced e.g. xsd schema as reference (by $ref keyword) in JSON. I used your xsd schema in this JSON:

{
  "test": "test",
  "reference": {
    "$ref": "./sampleXML.xsd"
  }
}

and then after dereferencing I have:

{
  "test": "test",
  "reference": "<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd" targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd" elementFormDefault="qualified"> <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/> <xsd:element name="BillTo" type="tns:USAddress"/> </xsd:sequence> <xsd:attribute name="OrderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name"   type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city"   type="xsd:string"/> <xsd:element name="state"  type="xsd:string"/> <xsd:element name="zip"    type="xsd:integer"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> </xsd:schema>"
}

Sep 29 '21 20:09 magicmatatjahu

Many thanks to @magicmatatjahu , @TamimiGitHub , @masterhead and @damaru-inc for helping me understand the implications here. I've learned a lot thanks to you all.

In an attempt to summarize, I wanted to walk through a couple scenarios and see if we are the on same page

I want to have entire .xsd imported into AsyncAPI, as a string

components:
  messages:
    UserSignedUp:
      payload:
         $ref: ./some_xsd.xsd
      contentType: "application/xml"

I just want a pointer from AsyncAPI to a schema registry/file, not bringing in the whole thing (maybe because it's huge) components:

messages:
   UserSignedUp:
	  contentType: "application/xml"
	  x-payload-remote-ref: "https://example.com/myschema.xsd"

Provide a pointer to a particular element

components:
  messages:
    UserSignedUp:
	  contentType: "application/xml"
	  x-payload-remote-ref: "https://example.com/myschema.xsd"
	  x-payload-remote-pointer: "/PurchaseOrder"

For instance, if xsd has multiple root elements:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"
           elementFormDefault="qualified">
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>
 <xsd:element name="AnotherRootElement" type="tns:PurchaseOrderType"/>

Pointer to a particular element if importing the schema

components:
  messages:
    UserSignedUp:
      payload:
         $ref: ./some_xsd.xsd
         $pointer: "/PurchaseOrder"
      contentType: "application/xml"

Oct 01 '21 14:10 jessemenning

The described proposal/problem itself is related to my proposal, which I extended to use references to nested non-JSON schema objects - Proposal to allow defining schema format other than default one (AsyncAPI Schema) - please see section Update

@jessemenning You may be interested in this :)

Oct 12 '21 20:10 magicmatatjahu

This issue has been automatically marked as stale because it has not had recent activity :sleeping:

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience :heart:

Feb 10 '22 01:02 github-actions[bot]

Still valid. @derberg Could you remove stale label?

Feb 10 '22 11:02 magicmatatjahu

Any conclusion on this topic ? I agree that most important would be to get it possible to reference other schemas but skip any automated parsing. It is better to describe any event no matter which format in a industry standard then only being able to support JSON.

What needs to happen to get that into the next releases ?

Apr 26 '22 12:04 rober15

We definitely need a champion that wants to drive the change, come up with proposal, respond to feedback, and present it to others

May 09 '22 11:05 derberg

Apologies for letting this lie dormant for so long. Anyway, the proposal, as I see it, is what I wrote here with the added refinements that Jessie made. I think we've responded to feedback (tell me if I missed something.) Here it is presented to others. Is the next step, then, to merge Jessie's suggestions in with my original proposal and re-present it? Or are the next steps to actually do PRs against the spec and the parser? If the latter, I'd be happy to do a PR against the spec, but I'm not the best person to add features to the parser.

May 27 '22 13:05 MichaelDavisSolace

This issue has been automatically marked as stale because it has not had recent activity :sleeping:

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience :heart:

Sep 25 '22 00:09 github-actions[bot]

spec spec copied to clipboard

Proposal for Unparsable Remote References (mainly to support XML.)

spec
spec copied to clipboard