kafka-rest icon indicating copy to clipboard operation
kafka-rest copied to clipboard

Support string keys with Avro values

Open criccomini opened this issue 8 years ago • 33 comments

We currently use Avro for the values of our messages. It appears that we're being forced to use Avro for the key as well, since the REST proxy seems to tie the two together. We would much prefer to have our keys just be basic strings. I want to:

  1. Confirm that this currently isn't possible.
  2. Propose that it be supported, unless someone has an argument against it.

criccomini avatar Jun 09 '16 22:06 criccomini

@criccomini Correct, not supported right now. It definitely complicates things quite a bit in the implementation. Content-Type becomes confusing as it is now mixed between Avro and something else (and if its strings, ints, whatever, it's not even one of our existing supported types). We'd have to factor serialization out of the producers/consumers or we'd end up with combinatoric explosion in the number of producer instances we'd need to do arbitrary mix & match of types.

I totally get the request, but I think especially the Content-Type issues have a lot of details that need to be worked out to make this practical.

ewencp avatar Jun 09 '16 22:06 ewencp

Yea, I agree. The only thing I could come up with was some sort of string+avro Content-Type thing as well.

criccomini avatar Jun 09 '16 22:06 criccomini

Right, and given how routing/content negotiation works and deserializing is automatic via Jackson and tied to the message type, I'm not sure how this would work. Routing/content negotiation we could probably just do manually; the Jackson deserialization seems like the most difficult to resolve since it'll happen via Jersey invoking the MessageBodyProvider before kafka-rest app code is ever executed.

ewencp avatar Jun 09 '16 22:06 ewencp

I'm undecided if it's hacky to focus just on string key support, vs. arbitrary different key/value serdes.

criccomini avatar Jun 09 '16 23:06 criccomini

Yeah, I think arbitrary combos is unlikely to be useful in practice. If you're trying to use JSON keys and Avro values, you have bigger problems. Once you start with string though, int/long also makes sense as well as that will be a common key type.

ewencp avatar Jun 09 '16 23:06 ewencp

I agree. If that's the case, I wonder if some different approach might be palatable just for the string-key use case.

criccomini avatar Jun 09 '16 23:06 criccomini

(e.g. URL param, query string param, X-header (oh god no), etc)

criccomini avatar Jun 09 '16 23:06 criccomini

@criccomini Query param doesn't seem awful. But making a bunch of them for different types doesn't seem ideal. (And of course that doesn't really affect the fact that we need to reorganize where serialization is happening to make any of this work.)

Here's another idea for how to accomplish this: move it into the serializers instead, specifically the Avro ones. Basically an opt out for primitive types such that they get serialized directly without the magic byte + schema ID (and therefore also opting out of any schema registry integration/compatibility checking) and just get the raw serialized form. Which also means somehow knowing the type and configuring it for the deserializer. Serializers can tell if they are being used for a key, so this could also be restricted only to work for keys. I think the main drawback is that we'd effectively be configuring it for the entire proxy instead of per-request. So it'd be a site-wide agreement that primitive keys are included "bare".

To be honest, I understand why people want this and it can be convenient, but I'm not sure its a good idea to enable people to do this. It leaves you literally zero options for changing or adding formats for keys. Including the framing is really important for extensibility/compatibility. I understand that in a lot of cases you can reasonably assume the format of the key is fixed forever (or you're willing to pay the cost of figuring out the migration to multiple topics so you can add a new format in the new topic), but that isn't always the case and I'd even say that developer foresight wrt this issue isn't particularly good. I'd much prefer encouraging use of a format that gives you the ability to make changes and address any usability issues there if at all possible.

ewencp avatar Jun 13 '16 16:06 ewencp

As an example of using the primitive Avro types, use request data: {"key_schema": "\"string\"", "value_schema_id": 12345, "records": [{"key": "mykey", "value": {...}}]

Although it increases the overhead of the REST proxy slightly (a possible extra round to the schema registry), the query is still very simple to create.

blootsvoets avatar Oct 13 '16 08:10 blootsvoets

A possible (more RESTful) approach would be to use multipart HTTP response. In this case, two parts - one for key and one for payload.

joewood avatar Jul 13 '17 20:07 joewood

Any news about this ?

ValentinTrinque avatar Jun 14 '18 09:06 ValentinTrinque

I'm guessing there's still no updates on that one, right?

PuszekSE avatar Mar 25 '19 11:03 PuszekSE

+1

peoplemerge avatar May 02 '19 18:05 peoplemerge

+1

cornercoding avatar May 06 '19 15:05 cornercoding

+1

cecchisandrone avatar Jun 25 '19 13:06 cecchisandrone

+1

plinioj avatar Jul 04 '19 15:07 plinioj

FYI it's not possible to use KSQL with kafka rest proxy then. KSQL doesn't support avro key format, rest proxy doesn't support non-avro key format. Check and mate.

Are there any workarounds possible? Binary key format, asking KSQL to speed up with Avro...

UPDATE

Workaround that worked for us is using value_format=json in KSQL and tell rest proxy to use binary format and then base64 decode/json decode in application. Maybe can help someone.

mente avatar Nov 07 '19 17:11 mente

+1

makarova avatar Nov 07 '19 19:11 makarova

+1 damnit

brandonwittwer avatar Dec 04 '19 22:12 brandonwittwer

+1

ghost avatar Dec 13 '19 12:12 ghost

+1

apohrebniak avatar Dec 17 '19 10:12 apohrebniak

Yeah, I think arbitrary combos is unlikely to be useful in practice.

Hard disagree on this. If you use an Avro key in a persistent state store and you upgrade the schema on that key, then all the data in your store will effectively disappear. The schema ID is serialized in the initial bytes of your message. String keys are an effective strategy to avoid these headaches. Also, Avro keys can’t be used with range() operations on stores.

aaugusta avatar Jan 08 '20 11:01 aaugusta

+1 I read through this. The reasons for not modifying the endpoint make sense from an architectural purity point of view. But there's a lot of people asking for this with solid reasons for needing. Seems like something a little more pragmatic is in order.

How about creating another endpoint that only supports an unencoded key. The payload would be encoded per the header used in the normal topics endpoint.

clande avatar Jun 01 '20 23:06 clande

+1 Being able to specify different key and value formats would really be useful. Just like with Kafka Connect where you can specify key.converter and value.converter https://docs.confluent.io/current/schema-registry/connect.html

dainiusjocas avatar Nov 07 '20 18:11 dainiusjocas

This is completely absurd. You're in fact shipping two incompatible systems (KSQL and this). Who on earth does something like that?

mnowaczyk avatar Mar 11 '21 22:03 mnowaczyk

FYI: Support for different key and value format has been merged at https://github.com/confluentinc/kafka-rest/pull/797.

rigelbm avatar Mar 15 '21 14:03 rigelbm

@rigelbm - Is there any documentation on the new features? Is it included in the newest release? Specifically, how do we consume from a topic that has a String key and AVRO value? Didn't find instructions in latest docs here: https://docs.confluent.io/platform/current/kafka-rest/api.html

slominskir avatar Jun 15 '21 21:06 slominskir

And what about consuming? If I understand correctly, PR only addresses V3 API for the messages producing.

How I can, for example, read key in string format and value in avro?

Hubbitus avatar Apr 21 '22 20:04 Hubbitus

FYI: Support for different key and value format has been merged at #797.

I've looked into the new API (https://docs.confluent.io/platform/current/kafka-rest/api.html#records-v3) and it seems that specifically string as a key format is still not supported directly?...

Embedded formats: json, binary, avro, protobuf and jsonschema If data is provided as a string, it's treated as BASE64-representation of binary data.

If I have missed something, I'd really appreciate link to the respective docs :)...

PuszekSE avatar Apr 22 '22 10:04 PuszekSE

it work with the v3 records endpoint

the avro schema of the key :

{ "type": "string" }
import requests

headers = {
    'Content-Type': 'application/json',
}

data = {
    "key": {
        "data": "AAAAAA"
    }
}

response = requests.post(
    f"{rest_proxy}/v3/clusters/toto/topics/tata/records",
    headers=headers,
    json=data)
    
print(response.reason)
print(response.text)

give

OK
{"cluster_id":"toto","topic_name":"tata","partition_id":1,"offset":4,"timestamp":"2022-06-19T17:32:10.168Z","key":{"type":"AVRO","subject":"tata-key","schema_id":1,"schema_version":2,"size":12}}

raphaelauv avatar Jun 19 '22 23:06 raphaelauv