console icon indicating copy to clipboard operation
console copied to clipboard

Decoding uuid of type bytes and logicalType uuid

Open idkw opened this issue 3 years ago • 3 comments

Hi

I have an AVRO like this schema to represent a Contact entity :

{
    "type": "record",
    "name": "ContactDTO",
    "namespace": "com.somepackage.contact",
    "fields":
    [
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "firstName",
            "type": "string"
        },
        {
            "name": "lastName",
            "type": "string"
        },
        {
            "name": "phoneNumber",
            "type": "string"
        },
        {
            "name": "uuid",
            "type":
            {
                "type": "bytes",
                "logicalType": "uuid",
                "CustomEncoding": "UUIDAsBytesEncoding"
            }
        }
    ]
}

As you can see I chose to encode the uuid as binary to save space (16 bytes in binary instead of 36 in UTF-8 with dashes)

Kowl currently wrongly displays this binary value as UTF-8 like so : image

Our backend Java applications work properly for decoding the uuid from binary and I was wondering if kowl could be made to work also with binary uuid. SInce the AVRO schema already describes the field as type bytes and logicalType uuid I assume there is enough information to make the frontend properly display the uuid in human readable format with dahes as it is usually so for uuids.

Is it possible ? I would be happy to contribute to this feature with some guidance on where I should look into.

Thanks

idkw avatar Jan 12 '22 14:01 idkw

Hey @idkw , thanks for filing this issue with such a good and understandable report!

We currently use LinkedIn's go-avro lib to decode these messages from avro into JSON, see: https://github.com/cloudhut/kowl/blob/0e7d7d546ee20b5c395d1350b5244d75f478dd81/backend/pkg/kafka/deserializer.go#L173-L195

I'm wondering if that's something which has been added by go-avro in the recent or is currently simply not supported. Currently I'm very busy with other stuff so I do not really have the time to have a more in-depth look, but I'm happy to assist in case you want to tackle that or gather more information around it.

weeco avatar Jan 12 '22 20:01 weeco

Thanks for the explanation I dug a little in the linkedin's avro library and it looks like they support some logical types but not the uuid

https://github.com/linkedin/goavro/blob/ee029924456a57a922b9ca32091e0fe5259a1118/codec.go#L215 https://github.com/linkedin/goavro/blob/ee029924456a57a922b9ca32091e0fe5259a1118/codec.go#L583

Also I looked again at the avro specification and it says at https://avro.apache.org/docs/current/spec.html#UUID

  UUID
  The uuid logical type represents a random generated universally unique identifier (UUID).
  
  A uuid logical type annotates an Avro string. The string has to conform with RFC-4122

So in my case since I'm not using the standard logical type (of type string) but a custom one (of type bytes) to save space with a CustomEncoding I guess there is no hope getting this implemented. Seems too custom right ?

idkw avatar Jan 12 '22 22:01 idkw

Indeed, it seems so. Only if the backend were written in Java we could maybe load custom encoder classes.

To support any sort of custom encoding in Kowl I could think of multiple approaches (none of which is particularly easy to do).

Maybe we could provide a custom JavaScript API so users can define custom encoders/decoders in JS... But that'd require quite a bit of work, and we'd need to collect all sorts of example encoders that are being used "in the wild" so we'd have enough examples to even plan such an API.

Another approach could be to have some sort of extra application/service that'd provide some HTTP endpoint that the Kowl backend would send the encoded messages to. The service would then return a JSON representation. Even ignoring the amount of work for this, the overhead would also probably slow down message-search a lot!

So yeah, unfortunately I don't see any quick and easy way to support this right now :/

rikimaru0345 avatar Jan 20 '22 10:01 rikimaru0345

Is this still valid? The avro library changed

twmb avatar Oct 19 '23 15:10 twmb