confluent-kafka-python
confluent-kafka-python copied to clipboard
Fix type hinting of avro messages
Description
I use Pycharm as my IDE, and I dislike seeing complaints about type mismatch.
The value attribute of Message objects is typed as Optional[Union[str, bytes]].
However, AvroConsumers set that value to the deserialized message, i.e. whatever python datatype match the avro schema(most often, a dict). This generates red flags for any type checkers when I treat that value as a dict(or whatever I expect the deserialized message to be).
Not sure what's the best way to change the type hinting when using C bindings.
Edit: Also, Pycharm thinks Message.value takes a payload argument. Not sure why that is.
How to reproduce
e.g.
consumer = AvroConsumer(...)
message = consumer.poll()
field = message.value().get("field") # Pycharm highlights this as an error
Checklist
Please provide the following information:
- [x] confluent-kafka-python and librdkafka version (
confluent_kafka.version()andconfluent_kafka.libversion()):confluent_kafka.version() = ('0.11.5', 722176),confluent_kafka.libversion() = ('0.11.5', 722431) - [ ] Apache Kafka broker version: N/A
- [ ] Client configuration: N/A
- [ ] Operating system: N/A
- [ ] Provide client logs (with
'debug': '..'as necessary) - [ ] Provide broker log excerpts
- [ ] Critical issue
Also, not sure if I should create separate issues, but there are other type hinting problems.
- The doc string for the
AvroProducer.__init__says the argumentsdefault_key_schemaanddefault_value_schemaare strings(str), but it seems they are actually supposed to be(or at least can be)avro.Schemaobjects, such as obtained fromconfluent_kafka.avro.load.
I'm not 100% sure how we would fix this for AvroConsumer's since the type won't be known until runtime. As you mentioned Avro will deserialize the contents of the message which are infact a byte sequence into the type as it were defined in the writer's schema.
As for your comment about Avro[Producer|Consumer].init I agree, we should change the type to be schema as opposed to str.
Hmm, I was mistakenly thinking that messages would always be dict, but yeah, they can also be scalar types, or arrays.
At the very least, you can type it as typing.Any, which would make it type check. With a bit more effort, you can also define the type as a Union of all possible avro types, e.g.
AvroValue = Union[str, int, bytes, Dict[str, Any], List[Any], ...] # not sure if i'm missing types?
class AvroMessage:
def value(self) -> AvroValue: ...
Python typing/mypy doesn't support recursive types yet, sadly.