confluent-kafka-python icon indicating copy to clipboard operation
confluent-kafka-python copied to clipboard

A support for 'Protobuf Deserializer' with schema registry

Open dorocoder opened this issue 4 years ago • 8 comments

Description

At looking into the source code for Protobuf, it seems like not possible to deserialize a protobuf message without its corresponding static message type argument.

When it comes to Avro, all things needed have been provided and it works well to deserialize an Avro message via schema registry only.

I wonder wether a support for Protobuf deserialization with schema registry is in the middle of being implemented or not planned yet.

class ProtobufDeserializer(object):
    """
    ProtobufDeserializer decodes bytes written in the Schema Registry
    Protobuf format to an object.
    Args:
        message_type (GeneratedProtocolMessageType): Protobuf Message type.
   ...

dorocoder avatar Aug 01 '21 23:08 dorocoder

Not planned, haven't investigated. It would require a library that allowed the data to be traversed over dynamically - I'm not sure if this exists in Python or not.

mhowlett avatar Aug 02 '21 00:08 mhowlett

Not planned, haven't investigated. It would require a library that allowed the data to be traversed over dynamically - I'm not sure if this exists in Python or not.

Thanks for your quickest and obvious answer.

dorocoder avatar Aug 02 '21 01:08 dorocoder

Was gonna post this question as well. @mhowlett is there any equivalent libraries in lets say Java that allows the data to be traversed over dynamically ?

nilansaha avatar Aug 03 '21 16:08 nilansaha

Was gonna post this question as well. @mhowlett is there any equivalent libraries in lets say Java that allows the data to be traversed over dynamically ?

I hope you would find some answer at #link

dorocoder avatar Aug 03 '21 21:08 dorocoder

Fixed by https://github.com/confluentinc/confluent-kafka-python/pull/1852

rayokota avatar Dec 12 '24 16:12 rayokota

Fixed by #1852

Code on master of ProtobufDeserializer's constructor still requires a generated protobuf object... This issue isn't fixed

nivgold avatar Feb 12 '25 00:02 nivgold

Fixed by #1852

Code on master of ProtobufDeserializer's constructor still requires a generated protobuf object... This issue isn't fixed

@nivgold, that makes sense

sauljabin avatar Feb 12 '25 15:02 sauljabin

I'm trying to consume messages from a topic with a protobuf schema from the Python client (after installing confluent-kafka[protobuf,schemaregistry]). But I'm running into the same issue, plus some difficulty working around it.

Is there a recommended way to build protos from the schema registry, including extensions such as confluent/meta.proto, so that they can be used with ProtobufDeserializer?

It looks like this is possible in Databricks/Spark, where their from_protobuf method loads everything dynamically (we have other teams using Spark processing and that works).

(1) As @nivgold says, the ProtobufDeserializer still requires a Python proto message class, as in the example protobuf_consumer.py. Actually it looks like the example avro_consumer.py also loads the schema from a static file, so this may apply beyond protobuf schemas.

(2) I tried downloading the generated proto file from my Confluent schema registry. It requires confluent.field_meta to define options like:

    int32 my_number_field = 13 [(confluent.field_meta) = {                              
      params: [                                                                  
        {                                                                        
          key: "connect.type",                                                   
          value: "int16"                                                         
        }                                                                        
      ]                                                                          
    }];

This won't compile as is. I tried downloading meta.proto too and adding an import to my proto source file from the schema registry. However...

(3) If I do build the meta.proto message as well as my own message's descriptor, I get an error that TypeError: Couldn't build proto file into descriptor pool: duplicate symbol 'confluent.file_meta'. If I include meta.proto it in source but don't build the proto message, or build the Meta proto but then try to trim the conflicting pieces out, I variously get Python import error or proto generated code missing descriptor errors.

markfickett avatar Mar 17 '25 14:03 markfickett