spec icon indicating copy to clipboard operation
spec copied to clipboard

XML format: is "XML data" a document or an element?

Open jskeet opened this issue 2 years ago • 3 comments

Currently we have three XML types for data:

  • Text
  • Binary (base64)
  • "XML"

The XML form is required to have a single child element and no peer non-whitespace text nodes. I had interpreted that as "the data is an element" whereas Jem had written it intending "the data is a document". The two aren't the same - in particular, there could be other things we want for a document like a DTD. It also impacts how SDKs should deserialize the data - should it be as a Document object, or an Element object?

A few options:

  • Have two separate types for this (at which point we might want to move away from xsi:type as the way of specifying the data type). We could even have a third type which is "list of XML nodes" to truly allow xs:any content. (That would open further questions such as whether the data element could contain additional attribute nodes...)
  • Decide it's a document, and work out how to spec that in terms of other acceptable direct child nodes
  • Decide it's an element, in which case the spec just needs to clarify it (there shouldn't be as much other work)

jskeet avatar Apr 29 '22 08:04 jskeet

Some thoughts:

  • at the end of the day someone should be able to take the contents of the <data> element and pass it to an XML parser the same as if the contents were the only thing in an HTTP Body
  • I believe that extra XML metadata, such as DTDs, can not appear in the middle of an XML document which means they can not appear under <data>.
  • related to the other discussions this week, I believe that CloudEvent-ifying something should preserve the original data as best it can. So, changing the data to adhere to the CE wrapper XML doesn't seem appropriate to me.
  • to allow for extra XML metadata AND to try to avoid changing the user's event data, it seems to me that if the user gives us non-single-XML-Element for data then base64 encoding it would be the best way to ensure we preserve the user's intent.
  • then if they do give us a single-XML-Element for data we can do what's in the spec today (Jon's last bullet)

duglin avatar Apr 29 '22 12:04 duglin

I think this is really an SDK issue ... the Java SDK proposal provides an org.w3c.Document style interaction model to the XML data, extending it to support org.w3c.Element as well doesn't seem impossible.

I don't believe this affects the actual wire-format in any way ..

JemDay avatar May 26 '22 16:05 JemDay

I don't believe this affects the actual wire-format in any way ..

Well, it affects how we describe the expectation of what can be encoded "simply" in the XML format.

If someone asks "How do I represent an XML document as CloudEvent data in XML format?" should we recommend:

  • Use base64 and the binary option
  • Embed the document root element and use the XML option, noting that this loses additional information such as the XML document declaration, DTDs, and top-level comments
  • Something else?

I think describing the "nested" XML option as embedding an XML element rather than embedding an XML document simplifies expectations.

jskeet avatar Aug 22 '22 11:08 jskeet

Any update on this one?

duglin avatar Jan 19 '23 15:01 duglin

I think it's still worth discussing, but I really believe we can only reasonably represent an XML element.

jskeet avatar Jan 19 '23 15:01 jskeet

As it stands today the Java SDK allows the client application to present an org.w3c.dom.Document object as 'data', the SDK takes that, takes the root-element and injects it into the org.w3c.dom.Document that its constructing.

So .. this is all about the 'SDK implementations I believe" ... this is really no different than the way the JSON Format allows for a JSON Object for be carried ..

So .. from an XML Format perspective the spec says it's an Element. The Java SDK can be extended to add the ability to "add an Element" but the truth is you can't create an element out of thin-air, it's always in the context of a document.

The other thing to consider is that you can't blindly take the byte[] representation of an XL document and slam it into 'data' field of an XL document since the xml preamble may be dictating a different content-encoding.

I'm probably missing something...

JemDay avatar Jan 19 '23 16:01 JemDay

If the SDK injects the root element then it does sound like what you can represent is an element, not a document. (As noted before, you can't specify a DTD, or a declaration with an encoding etc.) I'd argue that would be more cleanly represented in the Java SDK as being able to specify an element (partly because you may only want to include a child-element of a document, not a document's root element) but it means that we should make the spec say that the XML content is really to represent an element.

jskeet avatar Jan 19 '23 16:01 jskeet

per 1/19 call - @jskeet to PR it

duglin avatar Jan 19 '23 17:01 duglin