pulsar-client-python
pulsar-client-python copied to clipboard
[python client] message properties are not round-trippable
Describe the bug Properties objects on messages can be set to (and published with) values that cannot be deserialized on the far side.
To Reproduce
- Using the Python client, publish a message on any topic with
properties={'foo': b'\x01-\x00\x97'}
- Using a Python consumer, consume that message and attempt to access
message.properties()
. - Observe that a
UnicodeDecodeError
is raised. - Repeat steps 1-3 with
properties={ b'\x01-\x00\x97': 'foo'}
Expected behavior Properties should be round-trippable: they should be deserialized with the same types and values with which they were set, and should not raise exceptions on deserialization.
There are three possible solutions here:
- Require that all properties keys and values be
bytes
s in Python. This is easy to implement inside the client, but breaks backwards compatibility. - Encode type information along with property keys and values. This is harder to implement inside the client (it doesn't seem like it's using
google.protobuf.Value
s on the wire at the moment, but I may be misreading the code) and deserialize the appropriate types in the consumer. - Less preferable: require that all keys and values be
str
s in Python. This is more restrictive than the protocol allows, but is probably simpler to implement.
Environment: MacOS 12 x86, Pulsar standalone 2.10, pulsar client 2.10, Python 3.7.13.
I don't look into the Python code at the moment, but I think the keys and values should be str
. The Java client also requires the key and value of a property being a String
. While in C++, the std::string
is just a sequence of ASCII characters. We should handle the encode and decode at Python client side.
The issue had no activity for 30 days, mark with Stale label.
Any update?