confluent-kafka-python icon indicating copy to clipboard operation
confluent-kafka-python copied to clipboard

Python client doesn't support hyphens in schema namespace

Open doug2525 opened this issue 4 years ago • 4 comments

Description

Using the confluent avro python client it will not deserialize messages that have a schema with a dash in the namespace field.

How to reproduce

Create a topic with a schema and a namespace with a dash in it. Produce some messages with a java client. Try to consume messages from the topic with a confluent python client. error message: avro.schema.SchemaParseException: Invalid schema name 'namespace-withdash.name' infered from name 'name' and namespace 'namespace-withdash'.

Checklist

Please provide the following information:

  • [ ] confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):
  • [ ] Apache Kafka broker version:
  • [ ] Client configuration: {...}
  • [ ] Operating system:
  • [ ] Provide client logs (with 'debug': '..' as necessary)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue

doug2525 avatar Aug 14 '20 20:08 doug2525

That looks more like a java client issue, based on specification namespace can't contain -. http://avro.apache.org/docs/current/spec.html#names

The name portion of a fullname, record field names, and enum symbols must:

    start with [A-Za-z_]
    subsequently contain only [A-Za-z0-9_]

A namespace is a dot-separated sequence of such names.

tomaszbartoszewski avatar Aug 22 '20 18:08 tomaszbartoszewski

We've only seen the issue with the python client, other consumers didn't seem to be affected by it. Which if it is java as well maybe it is a schema registry issue that it shouldn't validate the schema with a - in the namespace?

doug2525 avatar Aug 24 '20 13:08 doug2525

If you haven't already done so please try using the new AvroSeriazlier API which handles hyphens in the namespace.

An easy way to achieve this would be to demonstrate this would be to update the schema definition in avro_consumer.py and avro_producer.py. The following schema was used as a validation case (though a test ought to be submitted as well)

Example

    {
        "namespace": "confluent-io-examples-serialization-avro",
        "name": "User",
        "type": "record",
        "fields": [
            {"name": "name", "type": "string"},
            {"name": "favorite_number", "type": "int"},
            {"name": "favorite_color", "type": "string"}
        ]
    }

rnpridgeon avatar Aug 26 '20 08:08 rnpridgeon

@rnpridgeon Is the same true for AvroDeserializer? I have a case where I'm getting a ClientError on a consumer of a topic/schema with a hyphen in it:

confluent_kafka.avro.error.ClientError: Received bad schema (id 53) from registry: Schema parse failed: Invalid schema name 'debezium.my-namespace.my_topic.Envelope' infered from name 'Envelope' and namespace 'debezium.my-namespace.my_topic'.

Currently we're using the deprecated AvroConsumer on version 1.8.2 of the module. I think I know the answer here, but I haven't found concrete evidence that upgrading to the latest version of the module and using AvroDeserializer instead would clear up this issue.

jslusher avatar Sep 06 '23 17:09 jslusher