spec icon indicating copy to clipboard operation
spec copied to clipboard

How should we handle text_data in proto

Open duglin opened this issue 2 years ago • 39 comments

See: https://github.com/cloudevents/sdk-go/issues/759

From that issue:


On the use of the text_data and binary_data fields in the protobuf-format spec, it says:

When the type of the data is text, the value MUST be stored in the text_data property. datacontenttype SHOULD be populated with the appropriate media-type. When the type of the data is binary the value MUST be stored in the binary_data property. datacontenttype SHOULD be populated with the appropriate media-type.

The Java SDK's serializer always uses the text_data field when the datacontenttype is application/json, application/xml or text/*. Otherwise the binary_data field is used. However, in the Go SDK's deserializer binary_data is treated literally, while text_data gets serialized using the appropriate serializer for the datacontenttype. When a CloudEvent with a datacontenttype of application/json is serialized into protobuf using the Java SDK and then sent to a service using the Go SDK and deserialized there, this results in the JSON in the text_data field being interpreted as a JSON-String when it should be treated as a JSON-Object. So if I serialize this CloudEvent into protobuf using the Java SDK:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "subject": null,
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "application/json",
    "data" : {
        "appinfoA" : "abc",
        "appinfoB" : 123,
        "appinfoC" : true
    }
}

and then send the resulting protobuf object to a service using the Go SDK and deserialize it there back into JSON format, the result is:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "subject": null,
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "application/json",
    "data" : "{ \"appinfoA\" : \"abc\", \"appinfoB\" : 123, \"appinfoC\" : true }"
}

This seems to be unintended behavior, but I as far as I can tell, neither the Java nor the Go SDK implementations of the Protobuf format are technically wrong according to the way the spec is written, so I decided to raise this issue here to get clarification. Is this a bug in one or both of these libraries? Should the Protobuf format spec be more specific in how the binary_data and text_data fields should be used? Or is this behavior somehow intended?

@JemDay FYI

duglin avatar Apr 06 '23 19:04 duglin

This was also discussed on the 2022-07-28 call: https://www.youtube.com/watch?v=cg0jKFwyJeM&list=PLO-qzjSpLN1BEyKjOVX_nMg7ziHXUYwec&index=34

duglin avatar Apr 06 '23 19:04 duglin

I think this is where Mr. @jskeet conformance tests are going to come in useful :-)

As you mention the behavior of the Go SDK is a bit odd, but potentially not incorrect - until you read the fine-print in the JSON Format specification. ;-)

If the datacontenttype declares the data to contain JSON-formatted content, a JSON serializer MUST translate the data value to a JSON value, and use the member name data to store it inside the JSON representation. The data value MUST be stored directly as a JSON value, rather than as an encoded JSON document represented as a string. An implementation MAY fail to serialize the event if it is unable to translate the runtime value to a JSON value.

Given that statement I'd humbly suggest that the Go SDK is not spec compliant, whether all the other SDKs are compliant in this scenario is another question entirely...

JemDay avatar Apr 06 '23 21:04 JemDay

The bottom line here seems to be that if your application code hands a 'string' to the SDK along with the assertion that it is JSON then the SDK its required to parse the string back into a JSON value and stick that in the data property.

If the parse step fails then the SDK should error-out.

JemDay avatar Apr 06 '23 21:04 JemDay

@lionelvillard if we "fixed" the go-lang SDK so that we don't treat this as a string do you think this will break lots of Knative folks?

duglin avatar Apr 20 '23 16:04 duglin

I might be doing it wrong, but in testing the golang SDK I'm seeing this:

Original EVENT:
{
  "data": {
    "Hello": "there"
  },
  "datacontenttype": "application/json",
  "id": "",
  "source": "example/uri",
  "specversion": "1.0",
  "type": "example.type"
}

What HTTP sees:
{
  "data": {
    "Hello": "there"
  },
  "datacontenttype": "application/json",
  "id": "c6c142aa-3943-42de-96ad-1186e39a1cb6",
  "source": "example/uri",
  "specversion": "1.0",
  "time": "2023-04-20T20:28:53.710827689Z",
  "type": "example.type"
}

Receiving it and serializing it back as JSON:
{
  "data": {
    "Hello": "there"
  },
  "datacontenttype": "application/json",
  "id": "c6c142aa-3943-42de-96ad-1186e39a1cb6",
  "source": "example/uri",
  "specversion": "1.0",
  "time": "2023-04-20T20:28:53.710827689Z",
  "type": "example.type"
}

I think it's working as expected. Perhaps something changed since the original issue was opened.

duglin avatar Apr 20 '23 20:04 duglin

@duglin: I believe the point is that it's probably fine if the SDK parses something that already has a JSON object as the data value, because it presumably preserves that as a JSON object. But if it's just given a string (e.g. because it's transcoding from the proto format) it doesn't try to convert that into a JSON object, even if the content type is application/json.

Here's C# code (using the System.Text.Json formatter, but I'm sure the Newtonsoft.Json one will behave the same way) showing the issue in C#:

using CloudNative.CloudEvents;
using CloudNative.CloudEvents.SystemTextJson;
using System.Text;

var evt = new CloudEvent
{
    Id = "Issue1186",
    Type = "example.type",
    Source = new Uri("example/uri", UriKind.RelativeOrAbsolute),
    Data = "{\"Hello\": \"there\"}",
    DataContentType = "application/json"
};

var formatter = new JsonEventFormatter(new() { WriteIndented = true }, new());
var bytes = formatter.EncodeStructuredModeMessage(evt, out _);
Console.WriteLine(Encoding.UTF8.GetString(bytes.Span));

Output:

{
  "specversion": "1.0",
  "id": "Issue1186",
  "type": "example.type",
  "source": "example/uri",
  "datacontenttype": "application/json",
  "data": "{\u0022Hello\u0022: \u0022there\u0022}"
}

It would be good to know what other SDKs do in the same scenario.

Fundamentally I think this is a problem with model we have of data within our formats and SDKs - each format is opinionated in its own way, violating the model of "data is just a sequence of bytes". That does make for much more readable serialized events - at the cost of this sort of thing. I really don't know where we should go from here.

jskeet avatar Apr 21 '23 07:04 jskeet

That example I used was based on the first comment in the issue, here's another example passing in a string that looks like json:

START EVENT:
{
  "data": "{\"hello\": \"there\"}",
  "datacontenttype": "application/json",
  "id": "",
  "source": "example/uri",
  "specversion": "1.0",
  "type": "example.type"
}

HTTP:
{
  "data": "{\"hello\": \"there\"}",
  "datacontenttype": "application/json",
  "id": "17270e4f-7fac-4b35-b96a-0abec69f8598",
  "source": "example/uri",
  "specversion": "1.0",
  "time": "2023-04-21T02:02:02.869891943Z",
  "type": "example.type"
}

FINAL EVENT:
{
  "data": "{\"hello\": \"there\"}",
  "datacontenttype": "application/json",
  "id": "17270e4f-7fac-4b35-b96a-0abec69f8598",
  "source": "example/uri",
  "specversion": "1.0",
  "time": "2023-04-21T02:02:02.869891943Z",
  "type": "example.type"
}

Still seems right to me.

Are you saying that the C# example you gave is wrong? It looks right to me too. You gave it a JSON string, so you get back a JSON string. Not a JSON object.

Try your example with "{[[[}" and see what happens. IMO it should not complain about invalid JSON in the string... it's just an array of bytes.

START EVENT:
{
  "data": "{[[}",
  "datacontenttype": "application/json",
  "id": "",
  "source": "example/uri",
  "specversion": "1.0",
  "type": "example.type"
}

HTTP:
{
  "data": "{[[}",
  "datacontenttype": "application/json",
  "id": "b25be358-3a59-4965-962e-786533156bba",
  "source": "example/uri",
  "specversion": "1.0",
  "time": "2023-04-21T02:02:02.870163666Z",
  "type": "example.type"
}

FINAL EVENT:
{
  "data": "{[[}",
  "datacontenttype": "application/json",
  "id": "b25be358-3a59-4965-962e-786533156bba",
  "source": "example/uri",
  "specversion": "1.0",
  "time": "2023-04-21T02:02:02.870163666Z",
  "type": "example.type"
}

duglin avatar Apr 21 '23 10:04 duglin

Are you saying that the C# example you gave is wrong? It looks right to me too. You gave it a JSON string, so you get back a JSON string. Not a JSON object.

I'm saying that it's different to the Java SDK, and at least plausibly violating the bit of the spec that Jem quoted. (It looks like that piece of the spec is subject to interpretation, which is never great.)

At the very least, it means that we don't have portability across formats - converting from one format to another and back again in the "obvious" way can easily lose information.

jskeet avatar Apr 21 '23 11:04 jskeet

To me, the key is the first character after the "data": . If it's { then it's a JSON Object. If it's " then it's a JSON String and not a JSON Object that has been encoded as a JSON String that needs to be decoded by the SDK. The app should see it as a string.

Which part of the spec does the above example violate? This sentence:

The data value MUST be stored directly as a JSON value, rather than as an encoded JSON document
represented as a string. 

(to me) is consistent with my first sentence in this comment... the first char after "data": tells you how to interpret what comes next... Object vs String.

Another way I view it is this, for :

"data": "hello",
"data": "[[[",
"data": "{ \"hello\": \"there\" },

Is someone suggesting that each of these might result in a different interpretation? ie. string, error and json object respectively? If so, that doesn't seem right since each of these are valid JSON Values (strings) and I don't see how else someone could encode them in the CE to say "these are strings, don't try to decode them as json".

What's interesting is if someone wanted to say that "{\"hello\":\"there\"}" should be a JSON Object, then the logic being suggested is:

  • remove the outer quotes
  • parse as JSON

but then that same logic doesn't apply for "hello". Remove the quotes, parse hello as JSON and things fail because hello by itself isn't valid JSON. So, is that same person suggesting that "data": "hello" isn't a valid CE? I would hope not.

There could be a bug someplace, but I'm not seeing it being the spec....unless I'm just not groking this

duglin avatar Apr 21 '23 11:04 duglin

To me, the key is the first character after the "data":

But that's the result of serialization. The problem is in the act of serializing (not deserializing; I don't believe that part is controversial). What Jem is saying is that if an SDK has a bunch of bytes (or a string) as the data, and the data content type has been set to application/json, it should be interpreted by the SDK as already being JSON - and that the act of serializing the CloudEvent in the JSON format should be to represent the JSON value that's already JSON-encoded in that text. That means the SDK would need to either parse or at least validate that it is JSON to start with, which is non-trivial, of course.

Basically I think the distinction is between whether data content type should be used to indicate "this is already the type of data that I've presented to you" or "this is what I want it to be".

jskeet avatar Apr 21 '23 11:04 jskeet

How is that "bunch of bytes" given to the SDK? Are we talking about it's part of a stream from bytes from an HTTP body? If so, there's still a char/byte after the "data": to determine things. If it's event.SetData(xxx) then the determining thing to me is whatever xxx is. If it's a "bunch of bytes" then they're giving the SDK a "bunch of bytes" and on the other end the receiver should see "a bunch of bytes". So... it's either a string or a byte array. But I don't see why someone would think it's a JSON object.

I guess I view this is an SDK issue in that each SDK needs to decide how it wants its users to populate data so the SDK knows how to serialize it correctly and if it choose a setter mechanism that isn't clear on the user's intent then it needs to fix that. E.g. given x="{\"hello\":\"there\"}" there may need to be event.SetDataAsString(x) and event.SetDataAsJson(x).

But I don't think the main spec itself is ambiguous, or that a JSON serialized CE is open to interpretation w.r.t. what data is.

What's interesting is your question about whether data content type should influence things - and my current thinking "no" since I think whatever we do should work correctly even when that property is null - and should net the same results as when the property is set. Imagine how annoyed a user would be if they never set the property, things worked but then decided to be explicit about everything by setting this property and suddenly the serialization of their CEs changes and breaks things. That feels bad.

duglin avatar Apr 21 '23 13:04 duglin

Are we talking about it's part of a stream from bytes from an HTTP body? If so, there's still a char/byte after the "data": to determine things.

Again, that would only be if we were deserializing a JSON-format CloudEvent to start with. That's not the situation we're talking about.

The two scenarios described so far are:

  • The CloudEvent was received in protobuf format, with the payload being represented either as a byte array or a string
  • The CloudEvent was constructed in code, with a string or byte array as the payload

If it's a "bunch of bytes" then they're giving the SDK a "bunch of bytes" and on the other end the receiver should see "a bunch of bytes"

That's definitely not what the spec says at the moment though. (Storing the bytes directly in data_base64 wouldn't be compliant with "If the datacontenttype declares the data to contain JSON-formatted content, a JSON serializer MUST translate the data value to a JSON value, and use the member name data to store it inside the JSON representation."

I do believe we've got a problem if we're not consistent: if the result of steps of:

  • Parse from Protobuf format to a CloudEvent
  • Serialize that CloudEvent in JSON format

(Or other combinations...)

... comes up with significantly different results depending on the SDK used, I think that's an issue.

I'd be open to saying "The JSON format spec is unfortunate here, and we should revisit it (and therefore the Java SDK handling)" but I'm not convinced yet in either direction. But I am convinced there's a problem.

jskeet avatar Apr 21 '23 13:04 jskeet

Data = "{\"Hello\": \"there\"}",

is this a JSON string? isn't a JSON string something like Data = "\"{\"Hello\": \"there\"}\"" as per https://www.rfc-editor.org/rfc/rfc8259#section-7? (I mean as a sequence of bytes "{\"Hello\": \"there\"}")

pierDipi avatar Apr 21 '23 15:04 pierDipi

I tried with java SDK for some examples you have showed here (no HTTP deserialization):

        final CloudEvent[] events = new CloudEvent[]{
            CloudEventBuilder.v1()
            .withId(UUID.randomUUID().toString())
            .withType("foo")
            .withSource(URI.create("/foo"))
            .withData("application/json", "\"{\"Hello\": \"there\"}\"".getBytes(StandardCharsets.UTF_8))
            .build(),
            CloudEventBuilder.v1()
                .withId(UUID.randomUUID().toString())
                .withType("foo")
                .withSource(URI.create("/foo"))
                .withData("application/json", "{\"Hello\": \"there\"}".getBytes(StandardCharsets.UTF_8))
                .build(),
            CloudEventBuilder.v1()
                .withId(UUID.randomUUID().toString())
                .withType("foo")
                .withSource(URI.create("/foo"))
                .withData("application/json", "{[[[}".getBytes(StandardCharsets.UTF_8))
                .build(),
            CloudEventBuilder.v1()
                .withId(UUID.randomUUID().toString())
                .withType("foo")
                .withSource(URI.create("/foo"))
                .withData("application/json", "\"{[[[}\"".getBytes(StandardCharsets.UTF_8))
                .build()
        };

        for (final CloudEvent ce: events) {
            byte[] bytes = getFormat().serialize(ce);
            System.out.println(new String(bytes, StandardCharsets.UTF_8));
        }

output:

{"specversion":"1.0","id":"eb3a465f-7a48-4370-aa14-9845c2a2f4c9","source":"/foo","type":"foo","datacontenttype":"application/json","data":"{"Hello": "there"}"}
{"specversion":"1.0","id":"915c3b33-9f13-475c-b2d8-64f49c000fb0","source":"/foo","type":"foo","datacontenttype":"application/json","data":{"Hello": "there"}}
{"specversion":"1.0","id":"e2f115dc-3c9c-46e5-937f-f271a37d69a0","source":"/foo","type":"foo","datacontenttype":"application/json","data":{[[[}}
{"specversion":"1.0","id":"534d04c2-ff0a-4124-813a-882d894d6ba4","source":"/foo","type":"foo","datacontenttype":"application/json","data":"{[[[}"}

It doesn't do any special processing it is just considering it as sequence of bytes. it even outputs invalid JSONs with data = {[[}, and that's not a JSON string, a JSON string is "{[[}" (like quote included)

pierDipi avatar Apr 21 '23 15:04 pierDipi

I'm not sure what's more wrong but to me the fact that Java SDK serialize to invalid JSON is wrong, however, for the {[[[} case, I'd expect to have data_base64 more than a quoted string because technically what {[[[} is can be a non UTF-8 sequence of bytes that is not a valid JSON string.

pierDipi avatar Apr 21 '23 15:04 pierDipi

I'm not sure the data_base64 plays a role in this since the oft-quoted text only talks about data and how to show the payload when both the format and data are JSON.

To me the quoted text is very clear... if you have JSON you can stick it directly under data when the format of the CE is structured JSON. No need for any special encoding. So again, perhaps its one of our others specs (or SDKs) that's broken but I'm still not seeing a problem with the core spec, or even the JSON spec yet.

As for "bunch of bytes in == bunch of bytes out", the spec doesn't need to say that directly because I believe it's implied in the sense that people should expect their event payload remains unchanged between the sending and receiving of the event. If an SDK breaks this assumption then I think the SDK is broken.

@pierDipi I think we may need to be more precise on what the term "JSON string" means and its context. I think we agree that when a string is serialized in JSON then quotes are added (string =hello, json serialization = "hello").

So:

Data = "{\"Hello\": \"there\"}",

to me is passing in a string, and it doesn't become a JSON string until the complete JSON is created. And at that point the string {"Hello": "there"} gets wrapped with quotes since that's the JSON serialization rule for strings.

The question is whether .withData() or .setData() are expecting the data to be "as seen by the code" or not. My default assumption is that it's "as seen by the code". Meaning, I call it with the raw event payload - no encoding, no escaping because for the most part the code shouldn't know about the format being used to send the event, and it certainly shouldn't be expected to encode/escape things on a per format basis. So to me, the code behind .withData() or .setData() should do whatever is needed to get it on the wire. Which is why I agree with you that it seems wrong for the Java SDK to produce invalid JSON for {[[[}. The exception to this might be if the method was something like .withAlreadyEncodedData() :-) then it's clear the caller of the SDK is advanced and didn't want the SDK to do any work at all. Do you think that was the intent of withData??

duglin avatar Apr 21 '23 16:04 duglin

Ok, I see, Java SDK represents data only as bytes (byte[]), so withData accepts only bytes and therefore there isn't really a possibility for misinterpretation (aka you pass x, you get x)

pierDipi avatar Apr 21 '23 16:04 pierDipi

that would imply they expect people to know the format and encode/escape things themselves.... that's interesting.

duglin avatar Apr 21 '23 16:04 duglin

Not only people, even things like proto text data to JSON format, the Java SKD loses the quotes:

        io.cloudevents.v1.proto.CloudEvent protoCe = io.cloudevents.v1.proto.CloudEvent
            .newBuilder()
            .setSpecVersion("1.0")
            .setId(UUID.randomUUID().toString())
            .setType("foo")
            .setSource("/foo")
            .setTextData("hello")
            .build();

      CloudEvent ce  = new ProtoDeserializer(protoCe).read(CloudEventBuilder::fromSpecVersion);
      
      byte[] bytes = new JsonFormat().serialize(ce);
      System.out.println(new String(bytes, StandardCharsets.UTF_8));

output:

{"specversion":"1.0","id":"b80e46fe-3428-4733-961e-ad56339e55d6","source":"/foo","type":"foo","data":hello}

that's not correct

pierDipi avatar Apr 21 '23 16:04 pierDipi

Is there javadoc available? I see docs here: https://cloudevents.github.io/sdk-java/api.html but no formal api definition. Or is javadoc too old school :-)

duglin avatar Apr 21 '23 17:04 duglin

https://www.javadoc.io/doc/io.cloudevents/cloudevents-api/latest/index.html :')

pierDipi avatar Apr 21 '23 17:04 pierDipi

thanks!

duglin avatar Apr 21 '23 17:04 duglin

I don’t consider that a valid usage pattern .. it looks like you’re accessing the raw protobuff message builerd whereas you should be using the CloudEventBuilder.

Ideally the io.cloudevents.v1.proto package shouldn’t be accessible outside the ProtoFormat module :-(

J--

On Apr 21, 2023, at 9:57 AM, Pierangelo Di Pilato @.***> wrote:

Not only people, even things like proto text data to JSON format, the Java SKD loses the quotes:

    io.cloudevents.v1.proto.CloudEvent protoCe = io.cloudevents.v1.proto.CloudEvent
        .newBuilder()
        .setSpecVersion("1.0")
        .setId(UUID.randomUUID().toString())
        .setType("foo")
        .setSource("/foo")
        .setTextData("hello")
        .build();

  CloudEvent ce  = new ProtoDeserializer(protoCe).read(CloudEventBuilder::fromSpecVersion);
  
  byte[] bytes = new JsonFormat().serialize(ce);
  System.out.println(new String(bytes, StandardCharsets.UTF_8));

output:

{"specversion":"1.0","id":"b80e46fe-3428-4733-961e-ad56339e55d6","source":"/foo","type":"foo","data":hello} — Reply to this email directly, view it on GitHub https://github.com/cloudevents/spec/issues/1186#issuecomment-1518100086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQQUTAVVXVGSFK2TKB6D33XCK36TANCNFSM6AAAAAAWVZJICU. You are receiving this because you were mentioned.

JemDay avatar Apr 21 '23 20:04 JemDay

@JemDay, I'm matching the implementation.

This part

        io.cloudevents.v1.proto.CloudEvent protoCe = io.cloudevents.v1.proto.CloudEvent
            .newBuilder()
            .setSpecVersion("1.0")
            .setId(UUID.randomUUID().toString())
            .setType("foo")
            .setSource("/foo")
            .setTextData("hello")
            .build();

      CloudEvent ce  = new ProtoDeserializer(protoCe).read(CloudEventBuilder::fromSpecVersion);

is equivalent to: https://github.com/cloudevents/sdk-java/blob/4ebeab0e0ff8d0c507c47b32dd7ec2da95dc87cb/formats/protobuf/src/main/java/io/cloudevents/protobuf/ProtobufFormat.java#L61-L69

    public CloudEvent deserialize(byte[] bytes, CloudEventDataMapper<? extends CloudEventData> mapper)
	    throws EventDeserializationException {
        try {
            final io.cloudevents.v1.proto.CloudEvent ceProto = io.cloudevents.v1.proto.CloudEvent.parseFrom(bytes);
            return new ProtoDeserializer(ceProto).read(CloudEventBuilder::fromSpecVersion);
        } catch (InvalidProtocolBufferException e) {
            throw new EventDeserializationException(e);
        }
    }

and as you can see, text data is then passed as raw bytes without quotes during deserialization and therefore the Json format just passes the data leading to the invalid JSON: https://github.com/cloudevents/sdk-java/blob/4ebeab0e0ff8d0c507c47b32dd7ec2da95dc87cb/formats/protobuf/src/main/java/io/cloudevents/protobuf/ProtoDeserializer.java#L99-L102

pierDipi avatar Apr 26 '23 13:04 pierDipi

Apologies for the delay, I've been out-of-town and I'm trying to keep up on this thread...

HTTP is a red-herring, we should be concentrating on the 'format' not the transport.

My earlier comment was relating to the builder used in that proto 'example' - it's not the correct one. Application code does-not (or should not) be using those constructs...

The CE Builder is the same irrespective of the format being used ...

If we look at this:

String jText = "{ \"name\" : \"bob\" }";

CloudEventBuilder b = new CloudEventBuilder();

CloudEvent ce = b.withId("xxx-yyy-xxx")
.withType("org.jem.test")
.withSource(URI.create("http://my.source/"))
.withData("application/json", jText.getBytes())
.build();

EventFormat ef = EventFormatProvider.getInstance().resolveFormat("application/cloudevents+json");

byte[] raw = ef.serialize(ce);

Produces this output on-the-wire (which adheres to the spec as I understand it):

{"specversion":"1.0","id":"xxx-yyy-xxx","source":"http://my.source/","type":"org.jem.test","datacontenttype":"application/json","data":{ "name" : "bob" }} For a proto example all you want to do is switch the format to ....

EventFormat ef = EventFormatProvider.getInstance().resolveFormat("application/cloudevents+proto");

And then you'd need to have some example code that blindly demarshalls the buffer, and then re-marshals it using the "proto json format" ... this is what the proto Unit Tests do to ensure the wire-format is as expected.

-- Apologiers for so many edits, that's me trying to rush an answer to the thread.

JemDay avatar Apr 26 '23 15:04 JemDay

Apologies for the delay, I've been hut-of-town and I'm trying to keep up on this thread...

HTTP is a red-herring, we should be concentrating on the 'format' not the transport.

My earlier comment was relating to the builder used in that proto 'example' - it's not the correct one. Application code does-not (or should not) be using those constructs...

The CE Builder is the same irrespective of the format being used ...

If we look at this:

String jText = "{ \"name\" : \"bob\" }";

CloudEventBuilder b = new CloudEventBuilder();

CloudEvent ce = b.withId("xxx-yyy-xxx")
.withType("org.jem.test")
.withSource(URI.create("http://my.source/"))
.withData("application/json", jText.getBytes())
.build();

EventFormat ef = EventFormatProvider.getInstance().resolveFormat("application/cloudevents+json");

byte[] raw = ef.serialize(ce);

Produces this output on-the-wire (which adheres to the spec as I understand it):

{"specversion":"1.0","id":"xxx-yyy-xxx","source":"http://my.source/","type":"org.jem.test","datacontenttype":"application/json","data":{ "name" : "bob" }} For a proto example all you want to do is switch the format to ....

EventFormat ef = EventFormatProvider.getInstance().resolveFormat("application/cloudevents+proto");

And then you'd need to have some example code that blindly demarshalls the buffer, and then re-marshals it using the "proto json format" ... this is what the proto Unit Tests do to ensure the wire-format is as expected

This won't work when jText = "a string"

pierDipi avatar Apr 26 '23 15:04 pierDipi

This won't work when jText = "a string"

I may be wrong (probably am) but I don't believe you can call withData with a String object on the CloudEvent Builder ...

JemDay avatar Apr 26 '23 17:04 JemDay

Right, you're not wrong, we need something like StringCloudEventData object that is instantiated by protobuf when text_data is used and properly handled by other formats (JSON being one of them)

pierDipi avatar Apr 26 '23 18:04 pierDipi

I agree there is some cleanup we can do but I think we need to "step away" from proto for a minute and limit ourselves to the JSON Format implementation as that was the issue at-hand.

We need crystal-clear guidance as to whether this is considered valid from a JSON Format specification perspective..

  ....
 "datacontenttype" : "application/json",
 "data" : "{  \"name\" : \"Kevin\" }",
 ....

or is the excpectation it's:

  ...
 "datacontenttype" : "application/json",
 "data" : {  "name":  "Kevin" },
  ....

JemDay avatar Apr 26 '23 18:04 JemDay

It would be good to know what other SDKs do in the same scenario.

@jskeet, I tested more or less the same thing for Java [ref]:

public class SimpleTests {
  private static final Logger LOGGER = Logger.getLogger(SimpleTests.class.getName());
  private static final JsonNodeFactory jnf = JsonNodeFactory.instance;

  @Test
  void jsonSerializeString() throws Exception {

    var event =
        CloudEventBuilder.v1()
            .withId("Issue1186")
            .withType("example.type")
            .withSource(URI.create("example/uri"))
            .withDataContentType("application/json")
            .withData("{\"Hello\": \"there\"}".getBytes(StandardCharsets.UTF_8))
            .build();

    var formatOptions = JsonFormatOptions.builder().forceStringSerialization(true).build();
    var format = new JsonFormat(formatOptions);

    var serialized = format.serialize(event);
    LOGGER.info(() -> String.format("serialized = %s", new String(serialized)));

    var mapper = new ObjectMapper();
    var parsed = mapper.readTree(serialized);
    LOGGER.info(() -> String.format("parsed = %s", parsed));

    var dataNode = parsed.get("data");
    assertEquals(jnf.objectNode().<ObjectNode>set("Hello", jnf.textNode("there")), dataNode);
  }
}

That test passes, and the output is:

SimpleTests > jsonSerializeString() STANDARD_ERROR
    Apr 29, 2023 12:56:48 PM io.github.aucampia.issue.cejsonstring.SimpleTests jsonSerializeString
    INFO: serialized = {"specversion":"1.0","id":"Issue1186","source":"example/uri","type":"example.type","datacontenttype":"application/json","data":{"Hello": "there"}}
    Apr 29, 2023 12:56:48 PM io.github.aucampia.issue.cejsonstring.SimpleTests jsonSerializeString
    INFO: parsed = {"specversion":"1.0","id":"Issue1186","source":"example/uri","type":"example.type","datacontenttype":"application/json","data":{"Hello":"there"}}

The output formatted with jq is:

{
  "specversion": "1.0",
  "id": "Issue1186",
  "source": "example/uri",
  "type": "example.type",
  "datacontenttype": "application/json",
  "data": {
    "Hello": "there"
  }
}

So Java is behaving differently, the specific code that is causing this behaviour is this

I would say the behaviour of the Java SDK is correct because the data in this case is essentially just the bytes of {"Hello": "there"}, and the spec for JSON format says:

https://github.com/cloudevents/spec/blob/30f1a081743bfe06b45b85519dd145f14a1ad22c/cloudevents/formats/json-format.md?plain=1#L153-L158

I will test the same in Python. Not sure if this answers all the questions here, if you would like me to test more things let me know.

aucampia avatar Apr 29 '23 11:04 aucampia