java-sdk icon indicating copy to clipboard operation
java-sdk copied to clipboard

Content-Type mismatch in publishEvent call

Open mukundansundar opened this issue 2 years ago • 6 comments

Expected Behavior

Content-Type mismatch in publishEvent call.

When a string event is published, using the

client.publishEvent("pubsubName", "topicName", "This is message #0")

What is seen in the broker for publish using gRPC protocol is:

{"data":"This is message #0","datacontenttype":"application/json","expiration":"2022-09-14T07:34:04Z","id":"87bc9a7b-ab08-4e7a-8583-3f1e0433ab67","pubsubname":"kafka-pubsub","source":"test-batch-pub","specversion":"1.0","topic":"testingtopic","traceid":"00-6b8bfc12d6d2bb6c7e2f80503628f2b8-8a495373d49513aa-01","traceparent":"00-6b8bfc12d6d2bb6c7e2f80503628f2b8-8a495373d49513aa-01","tracestate":"","type":"com.dapr.event.sent"}

Here because the data is not given in the client.publishEvent call, it is being inferred by the default serializer as application/json and that is the datacontenttype field that is set in the cloudEvent in broker.

On subscription, the subscriber correctly reads the value and users see something like

Subscriber got: This is message #0

But when we publish the same data This is message #0 with the publishEvent call with the content type correctly set

 PublishEventRequest req = new PublishEventRequest("pubsubName", "topicName", "This is message #0");
req.setContentType("text/plain");
client.publishEvent(req)

The data seen in the broker is such:

{"data":"\"This is message #0\"","datacontenttype":"text/plain","expiration":"2022-09-14T12:32:37Z","id":"e996716d-1d11-46d7-9112-6866cbb633ce","pubsubname":"kafka-pubsub","source":"test-batch-pub","specversion":"1.0","topic":"simpletest","traceid":"00-a58c282daed4035a660abbf14208cac2-283bb02770d94908-01","traceparent":"00-a58c282daed4035a660abbf14208cac2-283bb02770d94908-01","tracestate":"","type":"com.dapr.event.sent"}

and the subscriber actually gets Subscriber got: "This is message #0"

With the right content type given, the serialization is escaping the quotes. But when no content type is given, a wrong content type is assumed.

Actual Behavior

Proper match should be there between datacontenttype field and the actually data field when saved as cloudEvent.

Steps to Reproduce the Problem

send with the overloaded method of publishEvent(pubsub, topic, data) and publishEvent(publishEventReq) and set proper metadata for the full request.

Release Note

RELEASE NOTE:

mukundansundar avatar Sep 14 '22 12:09 mukundansundar

This should be fixed. Probably without "unescaping" the String but by correctly handling the datacontenttype attribute.

artursouza avatar Sep 14 '22 18:09 artursouza

One of the issues is that Java sdk has overloaded publishEvent methods that do not require content type as a parameter. IMO this should be deprecated in favor or requiring content type as a parameter in the overloaded functions also. Then the serializer should properly serialize the event based on the content type. @artursouza thoughts?

mukundansundar avatar Sep 14 '22 19:09 mukundansundar

If the user passes a String, we should skip serialization and pass it as-is and use text as content-type.

artursouza avatar Sep 14 '22 19:09 artursouza

@artursouza Based on the offline discussion that we had:

  1. If content type given, we expect the user to provide the right content type for the default serializer or the user should have overridden the serializer in the client.
  2. If no content type is provided, based on different scenarios in HTTP we try to convert the Object data into String, be it integer or any other primitive or leave it as string. We do not serialize the string. And we set the contentType field as text/plain. (@artursouza There is a caveat here that if the event is xml then ideally the content type should change.)
  3. For POJO, MessageLite we serialize using the JSON mapper and set as application/json. @artursouza The issue here is that even if it is a CloudEvent, only if the CloudEvent type is used, we will know or the user has to provide the application/cloudevents+json contentType.
  4. For byte[], in HTTP we should base64 encode that field and set content type as application/octet-stream and in gRPC we pass on the bytes as is and set contentType to applicaiton/octet-stream.

All the contentType changes will only be done if the contentType is not given.

@artursouza Though there is an issue with respect to XML data. Any thoughts on how that can be resolved?

mukundansundar avatar Sep 15 '22 08:09 mukundansundar

This is the code snippet from AbstractDaprClient.java.

  /**
   * {@inheritDoc}
   */
  @Override
  public Mono<Void> publishEvent(String pubsubName, String topicName, Object data) {
    return this.publishEvent(pubsubName, topicName, data, null);
  }

  /**
   * {@inheritDoc}
   */
  @Override
  public Mono<Void> publishEvent(String pubsubName, String topicName, Object data, Map<String, String> metadata) {
    PublishEventRequest req = new PublishEventRequest(pubsubName, topicName, data)
        .setMetadata(metadata);
    return this.publishEvent(req).then();
  }

It shows that no matter which function is called, it ends up using the same function.

In DaprClientHttp.java it implements publishEvent(PublishEventRequest request). It shows that data will be serialized directly before contentType setting. If user do not set anything, the contentType would be default type application/json.

  public Mono<Void> publishEvent(PublishEventRequest request) {

      ...

      Object data = request.getData();
      Map<String, String> metadata = request.getMetadata();

      byte[] serializedEvent = objectSerializer.serialize(data);
      // Content-type can be overwritten on a per-request basis.
      // It allows CloudEvents to be handled differently, for example.
      String contentType = request.getContentType();
      if (contentType == null || contentType.isEmpty()) {
        contentType = objectSerializer.getContentType();
      }
      Map<String, String> headers = Collections.singletonMap("content-type", contentType);

     ...

  }

So the trigger for this bug is whether users set the contenttype manually and correctly.

Maybe we clould add a judgment before serialize in publishevent(). If the contenttype is "text/plain", then manually convert it to byte[] without calling serialize().

@artursouza @mukundansundar What are your opnions? I can help to fix this.

MregXN avatar May 17 '23 05:05 MregXN

I have raised related issue #6448 in dapr/dapr. Maybe we need more discussion about whether dapr runtime should be responsible for this bug. @artursouza @mukundansundar

MregXN avatar Jun 07 '23 02:06 MregXN