pulsar-client-python
pulsar-client-python copied to clipboard
[python client] message properties are not round-trippable
Describe the bug Properties objects on messages can be set to (and published with) values that cannot be deserialized on the far side.
To Reproduce
- Using the Python client, publish a message on any topic with
properties={'foo': b'\x01-\x00\x97'} - Using a Python consumer, consume that message and attempt to access
message.properties(). - Observe that a
UnicodeDecodeErroris raised. - Repeat steps 1-3 with
properties={ b'\x01-\x00\x97': 'foo'}
Expected behavior Properties should be round-trippable: they should be deserialized with the same types and values with which they were set, and should not raise exceptions on deserialization.
There are three possible solutions here:
- Require that all properties keys and values be
bytess in Python. This is easy to implement inside the client, but breaks backwards compatibility. - Encode type information along with property keys and values. This is harder to implement inside the client (it doesn't seem like it's using
google.protobuf.Values on the wire at the moment, but I may be misreading the code) and deserialize the appropriate types in the consumer. - Less preferable: require that all keys and values be
strs in Python. This is more restrictive than the protocol allows, but is probably simpler to implement.
Environment: MacOS 12 x86, Pulsar standalone 2.10, pulsar client 2.10, Python 3.7.13.
I don't look into the Python code at the moment, but I think the keys and values should be str. The Java client also requires the key and value of a property being a String. While in C++, the std::string is just a sequence of ASCII characters. We should handle the encode and decode at Python client side.
The issue had no activity for 30 days, mark with Stale label.
Any update?