spec
spec copied to clipboard
handling of datacontenttype is inconsistent
CloudEvents 1.0
Consider this example, straight from the spec:
{
...
"datacontenttype" : "text/xml",
"data" : "<much wow=\"xml\"/>"
}
Clearly, data
is some structure that has been encoded using the XML format and put into the event as a string (binary). Naturally, I'd assume the same behavior for JSON encoding:
{
...
"datacontenttype" : "application/json",
"data" : "{\"foo\": \"bar\"}"
}
However, that's doesn't seem to be the case; as the example in the HTTP protocol binding spec shows, the JSON object is not sent in its encoded form but rather nested into the event directly:
{
...
"datacontenttype" : "application/json",
"data" : {
"foo": "bar
}
}
Note that removing the optional datacontenttype
attribute doesn't change this, as the spec clearly states:
A JSON-format event with no datacontenttype is exactly equivalent to one with datacontenttype="application/json".
To sum it up, it is not possible to put a JSON-encoded data blob into a CloudEvent; and a parser needs to treat application/json
different than any other datacontenttype
.
HTTP Protocol Binding 1.0
For structured content mode, the spec says:
The chosen event format defines how all attributes, and data, are represented.
Does this mean that datacontenttype
must be present and set to the event format? Or does structured mode implicitly change the default of datacontenttype
from application/json
to whatever event format is in use? What if datacontenttype
is present and set to a different encoding - must a parser treat this event as malformed?
JSON Event Format 1.0
As a side note, the JSON Format spec makes this even more confusing:
If the implementation determines that the type of data is Binary, the value MUST be represented as a JSON string expression containing the Base64 encoded binary value, and use the member name data_base64 to store it inside the JSON object.
This basically says that you have to Base64-encode any simple JSON string (which is, of course, binary). Also, if a receiver does not implement the optional (!) JSON Format spec, it won't be able to parse the data_base64
value; consequently, implementing the JSON Format spec as a sender means not implementing the full CloudEvents spec.
@n3wscott @clemensv any comments on this one?
Trying to remember the history....
See https://github.com/cloudevents/spec/blob/master/json-format.md#31-handling-of-data
In your example: "data" : "{\"foo\": \"bar\"}"
, what is data
? Is it a string that just happens to look like JSON or is it actually JSON? The content type being "app/json" doesn't help here since both treating it as a string is a valid JSON value. However, in this case, per section 3.1 it is a string that just happens to look like JSON.
I agree that when comparing the xml and json examples in the spec there appears to be a bit of an inconsistency, but (if I'm remembering correctly) we treat JSON payloads differently because they're JSON. Clearly, we can't put raw XML into JSON w/o some kind of encoding, but JSON doesn't have that problem, so it makes more sense to put the JSON into data
in it's raw form.
@clemensv @cneijenhuis @n3wscott ?
However, in this case, per section 3.1 it is a string that just happens to look like JSON.
Exactly, that's my point - datacontenttype
is not interpreted like it is for other media types and the spec doesn't allow to transmit a data structure JSON-encoded (at least the receiver couldn't tell whether it's a JSON encoded value or simply a JSON string without trying to decode it).
I agree that when comparing the xml and json examples in the spec there appears to be a bit of an inconsistency, but (if I'm remembering correctly) we treat JSON payloads differently because they're JSON.
The event format could be XML, right? So if content-type
is application/cloudevent+xml
and within that event datacontenttype
is set to application/json
, you'd clearly need to stringify (i.e., JSON-encode) your data (and XML-escape it). So you could say the spec actually means to never encode stuff that already uses the event format as its encoding, so JSON data
wouldn't be encoded if the event format is also JSON. But then it would be consistent to do the same in case of XML: if data
is already XML encoded and the event format is an XML, you'd expect data
to not be a string but as XML "in its raw form" instead whenever datacontenttype
is application/xml
. However, the spec says that data
needs to be XML-encoded no matter the event format.
@duglin I think, the important text in the section you referenced is
For any other type, the implementation MUST translate the data value into a JSON value, and use the member name data to store it inside the JSON object.
So, if data is already a JSON value, no translation is needed.
Both cases, A):
"data" : "{\"foo\": \"bar\"}"
and, B):
"data" : {
"foo": "bar
}
Are equivalent to the spec.
The message needs to be inspected to understand how to parse. How it becomes unmarshaled is up to the consumer of the event; they need to know if it is a string, array or a struct. In the case of B, this shortens the string escaping for the consumer.
~~Arrays are also allowed:~~
"data" : [
{"foo": "bar}
]
Arrays are not allowed. my bad.
Are equivalent to the spec.
is that true? How would a JSON Object that's just a simple string but looks like JSON be serialized then if not like this: "data" : "{\"foo\": \"bar\"}"
@n3wscott In the JSON Schema data is defined as being one of two types, object or string:
"data": {
"type": ["object", "string"]
}
but array type should not be allowed afaik.
You can however of course:
"data" : {
"foobar": [ ... ]
}
Are equivalent to the spec.
is that true? How would a JSON Object that's just a simple string but looks like JSON be serialized then if not like this:
"data" : "{\"foo\": \"bar\"}"
The spec says you can promote that inner json object or use it as a string. The consumer needs to inspect the string escaping to understand if it should be a string or object based on what it is trying to un-marshal into.
If I see: "data" : "hello"
in the event, I would hope that it would result in a single string with value of "hello" being passed on to the app. If so, I would not expect to see anything different if I replaced "hello" with something that looked like JSON, other than a new value for the string.
Where in the spec does it say you can alternate between a string or json? I didn't see it in: https://github.com/cloudevents/spec/blob/master/json-format.md#31-handling-of-data
@duglin It cannot alternate. It's either-or: https://github.com/cloudevents/spec/blob/master/spec.json#L12
IMO having both data and data_base64 is confusing. especially when you can define datacontenttype.
Are equivalent to the spec.
is that true? How would a JSON Object that's just a simple string but looks like JSON be serialized then if not like this:
"data" : "{\"foo\": \"bar\"}"
The spec says you can promote that inner json object or use it as a string. The consumer needs to inspect the string escaping to understand if it should be a string or object based on what it is trying to un-marshal into.
soooooo.... funny story... the spec changed at the last minute between 0.3 and 1.0 and it now is a must.
A):
"data" : "{\"foo\": \"bar\"}"
and, B):
"data" : {
"foo": "bar
}
Are both valid JSON but are not the same. A has been string escaped and is a JSON string. B is a Json object.
As my original post above suggests, I also have some difficulties following the JSON Format spec and would appreciate guidance (I'm currently implementing an Elixir lib for handling CloudEvents).
The core spec says that Binary
= "Sequence of bytes. String encoding: Base64 encoding per RFC4648.". The JSON Format spec says that "If the implementation determines that the type of data is Binary, the value MUST be represented as a JSON string expression containing the Base64 encoded binary value, and use the member name data_base64 to store it inside the JSON object.". So how should the implementation actually figure out whether something is considered Binary
? It is was binary in the first place, data
would already contain the base64-encoded data and the JSON format spec then suggests that data
- which is a string and thus also Binary
- must be base64-encoded again and put into data_base64
. I don't think this was the intention of the spec. Can you please state when base64 encoding must be used and how exactly an implementation should figure out whether something is Binary
(as it's not trivial to differentiate between an utf8 string and non-readable binary data)?
@kevinbader +1, I also think that if you allow "data" to be of type string, then datacontenttype+data is enough. There is no need for data_binary imo. wdyt?
There was some discussion around this in #520. In 0.3 we still had an attribute called datacontentencoding, but we had to remove it. Instead the data_base64 field was introduced in formats where it was needed.
There is no simple way to determine from a content-type, if the data is binary. Therefore an SDK will check for some well-known text-based formats and do base64 encoding otherwise. For an example of such a check, please have a look at what @alanconway wrote here.
By using either data
or data_base64
, the receiving side always knows, if a base64 decode step is needed.
@deissnerk thank you for the explanation! Got one more question:
CE also defines the "dataschema" property as "Identifies the schema that data
adheres to."
Could we then not define our schema as for example:
{
"type": "string",
"contentEncoding": "base64",
"contentMediaType": "image/png"
}
(https://json-schema.org/understanding-json-schema/reference/non_json_data.html)
which then would clearly identify contents of data (and data_base64 would be not needed at all)?
@tsurdilo The dataschema
attribute is defined as a URI. So you can of course use it to identify a schema like the one you posted, but it won't have any effect on the way the an SDK handles the event.
Do you want to send an event, that contains base64-encoded data end-to-end? That would be different from data_base64
that is only used to put binary data into the JSON format.
Btw, I think the "contentEncoding":"base64"
attribute could be added to #/definitions/data_base64
in spec.json.
@deissnerk added contentEncoding as part of the overall json schema update - pr https://github.com/cloudevents/spec/pull/563 Would you mind reviewing this pr? Thanks.
What's the status of this? I think we can close it but @kevinbader do you agree?
@duglin well, I still think that this part of the spec could be improved a lot - I'm still not sure how to implement this in the Elixir library for CloudEvents I was going to build. I mean, depending on whether I implement the optional JSON format spec, the parser behaviour for the data field is different in a non-compatible way. But I've laid out the details above already.
I'm aware that fixing this would mean a breaking change to the spec. But given that 1.0 is less than five months old and therefore it likely hasn't seen that much adoption yet, perhaps it would make sense to consider a v2 spec. If this seems reasonable to you I gladly offer you to create an RFC with some ideas on how this could be handled in a more consistent manner.
@kevinbader are you concerned about the sender or receiver of CEs? I think it’s the receiver, if so, where is the ambiguity? If it’s JSON and binary then it uses the data_base64 attribute, otherwise it uses the data attribute and the value is pure JSON not a stringified version on JSON.
We are currently looking into using CloudEvents 1.0 in our company. I agree with @duglin that the specification is clear.
Taking the example from @kevinbader:
"datacontenttype" : "application/json", "data" : "{\"foo\": \"bar\"}"
This example above is IMHO invalid, since the value of the data
attribute is a string "{\"foo\": \"bar\"}"
and not a valid JSON expression.
We human beings can guess that the string happens to be an encoded JSON. If you look at a longer example, you can't be so sure anymore:
{ \"swagger\": \"2.0\", \"info\": { \"description\": \"Repository of Events used within Platform hosted applications\", \"version\": \"1.0\", \"title\": \"Platform Event Registry Microservice\", \"termsOfService\": \"Terms of Service\", \"contact\": { \"name\": \"Platform Core Team\", \"email\": \"[email protected]\" } }, \"host\": \"localhost\", \"basePath\": \"\/\", \"tags\": [ { \"name\": \"event-registry-controller\", \"description\": \"Event Registry Controller\" } ], \"paths\": { \"\/api\/v1\/app\/{appId}\/events\": { \"get\": { \"tags\": [ \"event-registry-controller\" ], \"summary\": \"Get events for an application\", \"description\": \"Retrieve list of events which are produced and consumed by the given application.\\n* Called with: **Platform Admin** role\", \"operationId\": \"getAllEventsByAppIdUsingGET\", \"produces\": [ \"application\/json;charset=UTF-8\" ], \"parameters\": [ { \"name\": \"appId\", \"in\": \"path\", \"description\": \"appId\", \"required\": true, \"type\": \"string\", \"format\": \"uuid\" } ], \"responses\": { \"200\": { \"description\": \"OK\", \"schema\": { \"type\": \"object\", \"properties\": { \"consume\": { \"type\": \"array\", \"items\": { \"type\": \"object\", \"properties\": { \"consumers\": { \"type\": \"array\", \"items\": { \"type\": \"object\", \"required\": [ \"appID\" ], \"properties\": { \"appID\": { \"type\": \"string\", \"format\": \"uuid\" } }, \"title\": \"ApplicationDTO\" } }, \"eventAlias\": { \"type\": \"string\" }, \"eventDescription\": { \"type\": \"string\" }, \"eventId\": { \"type\": \"string\", \"format\": \"uuid\" }, \"eventSampledata\": { \"type\": \"string\" }, \"eventSchema\": { \"type\": \"string\" }, \"eventStatus\": { \"type\": \"string\" }, \"links\": { \"type\": \"array\", \"xml\": { \"name\": \"link\", \"namespace\": \"http:\/\/www.w3.org\/2005\/Atom\", \"attribute\": false, \"wrapped\": false }, \"items\": { \"type\": \"object\", \"properties\": { \"deprecation\": { \"type\": \"string\", \"xml\": { \"name\": \"deprecation\", \"attribute\": true, \"wrapped\": false } }, \"href\": { \"type\": \"string\", \"xml\": { \"name\": \"href\", \"attribute\": true, \"wrapped\": false } }, \"hreflang\": {
Following the specification of CloudEvents 1.0, we interpret data
indicated with application/json
directly as JSON without unescaping. This also makes sense on the producer side, because most of our products nowadays use JSON anyhow and we can directly paste or serialize JSON into the data
node.
@Thoemmeli I agree to your point, but "{\"foo\": \"bar\"}"
is a string and therefore also a JSON value. In that sense the example is valid, but the datacontenttype
in this case refers to the string and not to the escaped JSON object.
@kevinbader thoughts?
@duglin well, I've implemented this according to the spec and the replies here:
https://github.com/kevinbader/cloudevents-ex/blob/master/test/format/v_1_0/decoder/json_test.exs
Not particularly happy with it but I guess that's how it is.
@duglin @deissnerk This is coming up in my work on the Ruby SDK, and I want to bring up a clarification question.
To summarize a conclusion from above:
In the following CE:
const ce1 = new CloudEvent({
specversion: "1.0",
id: "C234-1234-1234",
source: "/mycontext",
type: "com.example.someevent",
datacontenttype: "application/json",
data: "{\"foo\": \"bar\"}"
});
... it sounds like the data should be considered a JSON value of type string. The fact that the string's value happens to look like serialized JSON is irrelevant. It is simply a string. Therefore, if we were to serialize this CE in HTTP Binary mode, it might look like this:
CE-SpecVersion 1.0
CE-Type: com.example.someevent
CE-Source: /mycontext
CE-ID: C234-1234-1234
Content-Type: application/json
"{\"foo\" : \"bar\"}"
The data must be "escaped" in this way, so that a receiver parsing this content with the application/json
content type will end up with a JSON string and not an object.
As a corollary, when deserializing an HTTP Binary mode CE with Content-Type: application/json
, the HTTP protocol handler must parse the JSON and set the data attribute in memory to the actual JSON value (rather than the string representation of the JSON document). Otherwise, the content's semantics will change when the CE gets re-serialized. And this, of course, all implies that an SDK's HTTP protocol handler (and perhaps other protocol handlers as well) must understand JSON, even if the JSON structured format is not in use.
Taking that as given, consider this implication:
Earlier a comparison was made with application/xml
, noting a possible inconsistency. Consider this parallel example:
const ce2 = new CloudEvent({
specversion: "1.0",
id: "C234-1234-1234",
source: "/mycontext",
type: "com.example.someevent",
datacontenttype: "application/xml",
data: "<much wow=\"xml\"/>"
});
If we were to treat this XML data consistently with how we treated the earlier JSON data, we would consider this data as a string node
in an XML document, whose contents just happen to look like XML. Hence, serializing this as HTML-Binary might yield something like:
CE-SpecVersion 1.0
CE-Type: com.example.someevent
CE-Source: /mycontext
CE-ID: C234-1234-1234
Content-Type: application/xml
<much wow="xml"/>
However, my understanding of the spec, and my understanding of the current behavior of the SDKs, suggests we are not doing that. (And indeed I'm glad, because that would, in turn, imply that all protocol handlers would also need to understand XML.) Instead, we actually consider the above data as semantically an XML document and not a string. Hence, serializing this as HTML-Binary actually looks like:
CE-SpecVersion 1.0
CE-Type: com.example.someevent
CE-Source: /mycontext
CE-ID: C234-1234-1234
Content-Type: application/xml
<much wow="xml"/>
In other words, our handling of the XML content-type appears to be inconsistent with our handling of the JSON content-type.
So my clarification question is:
- Am I correct in my interpretation that the spec intentionally treats data with content-type
application/json
specially, differently from string data with content-typeapplication/xml
(or indeed any other content-type), as illustrated above?
If so, follow-up questions:
- Is the reason for this that we (for some reason) consider JSON uniquely special among all content types in the universe, or is the reason simply that the spec currently happens to include a JSON format but not an XML format to define how data with that datacontenttype is rendered? Suppose a future spec version adds an XML format, YAML format, Protobuf format, etc. Would we at that time need to change the behavior of those formats to be like JSON (which would be a breaking change)?
- How do we precisely identify which content types are to be treated in this special way? For example,
application/json
is obvious, but what if the datacontenttype is itselfapplication/cloudevents+json
(i.e. a cloudevent whose payload is another cloudevent)? If we do consider JSON special, it seems it might be a good idea for the spec to state that explicitly, and define how it is identified, perhaps with reference to fields in RFC 2046 or similar.
I don't see any special handling of the JSON format here, @dazuma.
I think the nesting of quotes combined with the different representations of the event in Javascript (not JSON) and HTTP binary lead to some confusion. In your example ce1
you use Javascript:
const ce1 = new CloudEvent({
specversion: "1.0",
id: "C234-1234-1234",
source: "/mycontext",
type: "com.example.someevent",
datacontenttype: "application/json",
data: "{\"foo\": \"bar\"}"
});
This statement gives you an object ce1
with, among others, a string member data
. It depends now on the SDK, what this means. In the go SDK the Event
has an attribute DataEncoded
that holds the raw byte sequence of the payload. Only if you call DataAs
, the SDK attempts to parse DataEncoded
according to the datacontenttype
. How would your example look like in JSON?
Like this?
{
...
"data" : "{\"foo\": \"bar\"}",
...
}
The go SDK would store the serialized JSON value of data
in a byte sequence, where the first and the last byte would contain a "
. This is a JSON specific implementation, but it is part of the unmarshaling of the JSON format.
As, in Javascript, you don't have byte sequences, you would perhaps instead store the data as string and base64 encode it depending on the datacontenttype
. If you now express this string as a string literal in Javascript, it looks like this:
dataEncoded : "\"{\"foo\": \"bar\"}\""
or maybe a bit more readable:
dataEncoded : '"{\"foo\": \"bar\"}"'
If your data attribute is decoded, this means that, when serializing it again, it would have to be encoded again according to datacontenttype
. This holds true for JSON, XML and every other encoding. In the go SDK there are encoders and decoders for XML, JSON and text.
@deissnerk I think we are in agreement here on the interpretation of this example that I labeled ce1
. The type of the data
attribute is string
, and so when datacontenttype
is set to application/json
, the value of dataEncoded
must, as you point out, begin and end with "
to indicate that a string is being encoded.
dataEncoded : '"{\"foo\": \"bar\"}"'
So far I think we agree. My question is actually about the ce2
example. In JSON format, this looks like:
{
...
"datacontenttype" : "application/xml",
"data" : "<much wow=\"xml\"/>",
...
}
I constructed ce2
to be a parallel to ce1
. In ce1
, the content type is application/json
; in ce2
, it is application/xml
. In ce1
, the data
attribute is a string which looks like an encoded JSON document; in ce2
, the data
attribute is a string which looks like an encoded XML document. In all aspects, ce1
and ce2
are in parallel; the one case using JSON and the other using XML, but in both cases, the type of data
is string.
For ce2
, what is the value of dataEncoded
?
Is it: dataEncoded : '<much wow="xml"/>'
(which would decode to an XML document)
Or is it: dataEncoded : '<much wow="xml"/>'
(which would decode to a string)
Having the same question as @dazuma:
At IANA, there are about 114 application/.*+json
media types registered (note: application/cloudevents+json
is not even there). Are these JSON documents or should we treat them like the XML examples above?
@dazuma IMHO it should be:
dataEncoded : '<much wow="xml"/>'
@athalhammer
Currently the go SDK only recognizes application/json
and text/json
into account., but it's an interesting question.
@n3wscott @slinkydeveloper What do you think?
For the pending IANA registration we already have #557