java-sdk
java-sdk copied to clipboard
Content-Type mismatch in publishEvent call
Expected Behavior
Content-Type mismatch in publishEvent call.
When a string event is published, using the
client.publishEvent("pubsubName", "topicName", "This is message #0")
What is seen in the broker for publish using gRPC protocol is:
{"data":"This is message #0","datacontenttype":"application/json","expiration":"2022-09-14T07:34:04Z","id":"87bc9a7b-ab08-4e7a-8583-3f1e0433ab67","pubsubname":"kafka-pubsub","source":"test-batch-pub","specversion":"1.0","topic":"testingtopic","traceid":"00-6b8bfc12d6d2bb6c7e2f80503628f2b8-8a495373d49513aa-01","traceparent":"00-6b8bfc12d6d2bb6c7e2f80503628f2b8-8a495373d49513aa-01","tracestate":"","type":"com.dapr.event.sent"}
Here because the data is not given in the client.publishEvent
call, it is being inferred by the default serializer as application/json
and that is the datacontenttype
field that is set in the cloudEvent in broker.
On subscription, the subscriber correctly reads the value and users see something like
Subscriber got: This is message #0
But when we publish the same data This is message #0
with the publishEvent call with the content type correctly set
PublishEventRequest req = new PublishEventRequest("pubsubName", "topicName", "This is message #0");
req.setContentType("text/plain");
client.publishEvent(req)
The data seen in the broker is such:
{"data":"\"This is message #0\"","datacontenttype":"text/plain","expiration":"2022-09-14T12:32:37Z","id":"e996716d-1d11-46d7-9112-6866cbb633ce","pubsubname":"kafka-pubsub","source":"test-batch-pub","specversion":"1.0","topic":"simpletest","traceid":"00-a58c282daed4035a660abbf14208cac2-283bb02770d94908-01","traceparent":"00-a58c282daed4035a660abbf14208cac2-283bb02770d94908-01","tracestate":"","type":"com.dapr.event.sent"}
and the subscriber actually gets Subscriber got: "This is message #0"
With the right content type given, the serialization is escaping the quotes. But when no content type is given, a wrong content type is assumed.
Actual Behavior
Proper match should be there between datacontenttype field and the actually data field when saved as cloudEvent.
Steps to Reproduce the Problem
send with the overloaded method of publishEvent(pubsub, topic, data)
and publishEvent(publishEventReq)
and set proper metadata for the full request.
Release Note
RELEASE NOTE:
This should be fixed. Probably without "unescaping" the String but by correctly handling the datacontenttype attribute.
One of the issues is that Java sdk has overloaded publishEvent methods that do not require content type as a parameter. IMO this should be deprecated in favor or requiring content type as a parameter in the overloaded functions also. Then the serializer should properly serialize the event based on the content type. @artursouza thoughts?
If the user passes a String, we should skip serialization and pass it as-is and use text as content-type.
@artursouza Based on the offline discussion that we had:
- If content type given, we expect the user to provide the right content type for the default serializer or the user should have overridden the serializer in the client.
- If no content type is provided, based on different scenarios in HTTP we try to convert the Object data into String, be it integer or any other primitive or leave it as string. We do not serialize the string. And we set the contentType field as
text/plain
. (@artursouza There is a caveat here that if the event is xml then ideally the content type should change.) - For POJO,
MessageLite
we serialize using the JSON mapper and set asapplication/json
. @artursouza The issue here is that even if it is a CloudEvent, only if the CloudEvent type is used, we will know or the user has to provide theapplication/cloudevents+json
contentType. - For byte[], in HTTP we should base64 encode that field and set content type as
application/octet-stream
and in gRPC we pass on the bytes as is and set contentType toapplicaiton/octet-stream
.
All the contentType changes will only be done if the contentType is not given.
@artursouza Though there is an issue with respect to XML data. Any thoughts on how that can be resolved?
This is the code snippet from AbstractDaprClient.java.
/**
* {@inheritDoc}
*/
@Override
public Mono<Void> publishEvent(String pubsubName, String topicName, Object data) {
return this.publishEvent(pubsubName, topicName, data, null);
}
/**
* {@inheritDoc}
*/
@Override
public Mono<Void> publishEvent(String pubsubName, String topicName, Object data, Map<String, String> metadata) {
PublishEventRequest req = new PublishEventRequest(pubsubName, topicName, data)
.setMetadata(metadata);
return this.publishEvent(req).then();
}
It shows that no matter which function is called, it ends up using the same function.
In DaprClientHttp.java it implements publishEvent(PublishEventRequest request)
. It shows that data will be serialized directly before contentType setting. If user do not set anything, the contentType would be default type application/json
.
public Mono<Void> publishEvent(PublishEventRequest request) {
...
Object data = request.getData();
Map<String, String> metadata = request.getMetadata();
byte[] serializedEvent = objectSerializer.serialize(data);
// Content-type can be overwritten on a per-request basis.
// It allows CloudEvents to be handled differently, for example.
String contentType = request.getContentType();
if (contentType == null || contentType.isEmpty()) {
contentType = objectSerializer.getContentType();
}
Map<String, String> headers = Collections.singletonMap("content-type", contentType);
...
}
So the trigger for this bug is whether users set the contenttype manually and correctly.
Maybe we clould add a judgment before serialize in publishevent()
. If the contenttype is "text/plain"
, then manually convert it to byte[]
without calling serialize()
.
@artursouza @mukundansundar What are your opnions? I can help to fix this.
I have raised related issue #6448 in dapr/dapr. Maybe we need more discussion about whether dapr runtime should be responsible for this bug. @artursouza @mukundansundar