v0.4.13 serialization break
relay to #139, a quick verification:
from thriftpy2.transport.memory import TCyMemoryBuffer
from thriftpy2.protocol import cybin as proto
from thriftpy2.thrift import TType
b1 = TCyMemoryBuffer()
proto.write_val(b1, TType.MAP, {"hello": "1"},
spec=(TType.STRING, TType.STRING))
b1.flush()
b2 = TCyMemoryBuffer()
proto.write_val(b2, TType.MAP, {"hello": b"1"},
spec=(TType.STRING, TType.BINARY))
b2.flush()
b1.getvalue() != b2.getvalue()
TType was be specified in Apache Thrift standard, and Binary type should always have the same behavior with String in kinds of protocols, hence we missed some check to modify the value of Binary: 18 to String: 11 in complex type serialization (map, list, etc.).
@JonnoFTW @aisk @dmulter
I think maybe I have to revert this merge #139 and have a discussion again about the binary type support
Hold this issue before Apache JSON protocol re-merge.
I'll have a look at this when I get back to work on Monday
@JonnoFTW Thanks for your work! I have several suggestions:
- could you split support the Binary type feature and Apache JSON protocol to two pull requests?
- we should ensure that the Binary type has the same behavior with String outside of JSON-like protocols
- the Binary type maybe should allow the user to pass Python
strin with an implicit type converting?
@ethe
- could you split support the Binary type feature and Apache JSON protocol to two pull requests?
How do you want Thrifpy2's Apache JSON to handle the case where it doesn't support binary? Apache's thrift will send binary data as base64 encoded string. If we don't support binary in Apache JSON, then the result will be that thriftpy2 gets a base64 encoded string. From the test, without binary support, apache json will make the following field:
tbinary=b"\x01\x0fabc123\x00\x02"
Into this:
"19":{'str': 'AQ9hYmMxMjMAAg=='}}
Without binary support, my apache thrift protocol has no way of knowing whether or not to decode the above base64 encoded string. If we try to decode any string field using base64.b64decode we will not always get the right result since something like "notbas64":
base64.b64decode(b"notbas64")
Is a valid string and a valid base64 encoded string.
- we should ensure that the Binary type has the same behavior with String outside of JSON-like protocols
What do you mean? Please provide an example.
- the Binary type maybe should allow the user to pass Python
strin with an implicit type converting?
Currently, if a thrift field is specified as binary and you pass a string, the string is serialised to an appropriate intermediate binary representation as appropriate. When deserialising, it is converted back to a python bytes object. Is this what you want?