pyjson_tricks icon indicating copy to clipboard operation
pyjson_tricks copied to clipboard

Add support for encoding/decoding bytestrings

Open mcdevitts opened this issue 3 years ago • 2 comments

I have been using json_tricks for years now, and it's been very handy - especially with numpy. I recently realized that python's native bytestrings cannot be natively serialized in json. It would be very handy to include a bytes type if possible. I implemented my own version, and the hook / encode methods are listed below.

I would be happy to fold this into json_tricks if there's a desire.

def json_bytes_hook(dct: Any) -> bytes:
    """Return an encoded complex number to it's python representation.

    Parameters
    ----------
    dct : Any

    Returns
    -------
    value : bytes
    """
    if not isinstance(dct, dict):
        return dct
    if "__bytes__" not in dct:
        return dct
    parts = dct["__bytes__"]
    assert isinstance(parts, str)
    return parts.encode()


def json_bytes_encode(obj: Any, primitives: bool = False) -> Dict:
    """Encode a complex number as a json dictionary of it's real and imaginary part.

    Parameters
    ----------
    obj : Any
    primitives : bool

    Returns
    -------
    dict
        json primitives representation of `obj`

    """
    if isinstance(obj, bytes):
        if primitives:
            return obj.decode("utf8").split("\0", 1)[0]
        else:
            return hashodict(__bytes__=obj.decode("utf8").split("\0", 1)[0])
    return obj

mcdevitts avatar Mar 13 '22 05:03 mcdevitts

Thanks for the suggestion, sorry it took so long.

If I understand correctly, this encodes the bytes as utf8, right? Aren't there byte sequences that are not valid utf8?

mverleg avatar Jun 22 '22 21:06 mverleg

That's a great point. Another solution could be to encode the bytestring using the printable representation.

On Wed, Jun 22, 2022 at 3:32 PM Mark Verleg @.***> wrote:

Thanks for the suggestion, sorry it took so long.

If I understand correctly, this encodes the bytes as utf8, right? Aren't there byte sequences that are not valid utf8?

— Reply to this email directly, view it on GitHub https://github.com/mverleg/pyjson_tricks/issues/81#issuecomment-1163626782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJMKVFIXFJOAJ5RD3QQCMDVQOBAJANCNFSM5QS3TGOQ . You are receiving this because you authored the thread.Message ID: @.***>

mcdevitts avatar Jun 22 '22 22:06 mcdevitts

I think I'll do utf8 if it is valid, and base64 otherwise. I'll also pick base64 if primitives are requested. I considered higher bases but I think base64 will be easier for other tools.

mverleg avatar Nov 03 '22 13:11 mverleg

Version 3.16.0 is released with bytes support for Python 3.

Python 2 does not encode bytes because it does not have a bytes type (and is dead).

Let me know if you run into any problems.

mverleg avatar Nov 03 '22 19:11 mverleg