Add support for encoding/decoding bytestrings
I have been using json_tricks for years now, and it's been very handy - especially with numpy. I recently realized that python's native bytestrings cannot be natively serialized in json. It would be very handy to include a bytes type if possible. I implemented my own version, and the hook / encode methods are listed below.
I would be happy to fold this into json_tricks if there's a desire.
def json_bytes_hook(dct: Any) -> bytes:
"""Return an encoded complex number to it's python representation.
Parameters
----------
dct : Any
Returns
-------
value : bytes
"""
if not isinstance(dct, dict):
return dct
if "__bytes__" not in dct:
return dct
parts = dct["__bytes__"]
assert isinstance(parts, str)
return parts.encode()
def json_bytes_encode(obj: Any, primitives: bool = False) -> Dict:
"""Encode a complex number as a json dictionary of it's real and imaginary part.
Parameters
----------
obj : Any
primitives : bool
Returns
-------
dict
json primitives representation of `obj`
"""
if isinstance(obj, bytes):
if primitives:
return obj.decode("utf8").split("\0", 1)[0]
else:
return hashodict(__bytes__=obj.decode("utf8").split("\0", 1)[0])
return obj
Thanks for the suggestion, sorry it took so long.
If I understand correctly, this encodes the bytes as utf8, right? Aren't there byte sequences that are not valid utf8?
That's a great point. Another solution could be to encode the bytestring using the printable representation.
On Wed, Jun 22, 2022 at 3:32 PM Mark Verleg @.***> wrote:
Thanks for the suggestion, sorry it took so long.
If I understand correctly, this encodes the bytes as utf8, right? Aren't there byte sequences that are not valid utf8?
— Reply to this email directly, view it on GitHub https://github.com/mverleg/pyjson_tricks/issues/81#issuecomment-1163626782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJMKVFIXFJOAJ5RD3QQCMDVQOBAJANCNFSM5QS3TGOQ . You are receiving this because you authored the thread.Message ID: @.***>
I think I'll do utf8 if it is valid, and base64 otherwise. I'll also pick base64 if primitives are requested. I considered higher bases but I think base64 will be easier for other tools.
Version 3.16.0 is released with bytes support for Python 3.
Python 2 does not encode bytes because it does not have a bytes type (and is dead).
Let me know if you run into any problems.