confluent-kafka-python
confluent-kafka-python copied to clipboard
Python client doesn't support hyphens in schema namespace
Description
Using the confluent avro python client it will not deserialize messages that have a schema with a dash in the namespace field.
How to reproduce
Create a topic with a schema and a namespace with a dash in it. Produce some messages with a java client. Try to consume messages from the topic with a confluent python client. error message: avro.schema.SchemaParseException: Invalid schema name 'namespace-withdash.name' infered from name 'name' and namespace 'namespace-withdash'.
Checklist
Please provide the following information:
- [ ] confluent-kafka-python and librdkafka version (
confluent_kafka.version()
andconfluent_kafka.libversion()
): - [ ] Apache Kafka broker version:
- [ ] Client configuration:
{...}
- [ ] Operating system:
- [ ] Provide client logs (with
'debug': '..'
as necessary) - [ ] Provide broker log excerpts
- [ ] Critical issue
That looks more like a java client issue, based on specification namespace can't contain -
.
http://avro.apache.org/docs/current/spec.html#names
The name portion of a fullname, record field names, and enum symbols must:
start with [A-Za-z_]
subsequently contain only [A-Za-z0-9_]
A namespace is a dot-separated sequence of such names.
We've only seen the issue with the python client, other consumers didn't seem to be affected by it. Which if it is java as well maybe it is a schema registry issue that it shouldn't validate the schema with a - in the namespace?
If you haven't already done so please try using the new AvroSeriazlier API which handles hyphens in the namespace.
An easy way to achieve this would be to demonstrate this would be to update the schema definition in avro_consumer.py and avro_producer.py. The following schema was used as a validation case (though a test ought to be submitted as well)
Example
{
"namespace": "confluent-io-examples-serialization-avro",
"name": "User",
"type": "record",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": "int"},
{"name": "favorite_color", "type": "string"}
]
}
@rnpridgeon Is the same true for AvroDeserializer
? I have a case where I'm getting a ClientError on a consumer of a topic/schema with a hyphen in it:
confluent_kafka.avro.error.ClientError: Received bad schema (id 53) from registry: Schema parse failed: Invalid schema name 'debezium.my-namespace.my_topic.Envelope' infered from name 'Envelope' and namespace 'debezium.my-namespace.my_topic'.
Currently we're using the deprecated AvroConsumer
on version 1.8.2
of the module. I think I know the answer here, but I haven't found concrete evidence that upgrading to the latest version of the module and using AvroDeserializer
instead would clear up this issue.