msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Methods to convert to/from builtin objects

Open jcrist opened this issue 3 years ago • 0 comments

msgspec currently contains methods for converting converting objects to/from bytes using either JSON or MessagePack protocols. Sometimes it'd be useful to convert to/from "simpler" types (lists, dicts, ...).

Example use cases:

  • Encoding using a third-party protocol library like pyyaml. This can currently be handled on the encoding end by passing in a custom default method, but if the encode call is buried in some wrapper library the user may not have access to it directly and would need to recursively simplify the object before handing it off.
  • Decoding from a third-party protocol like pyyaml into higher-level types, while keeping the type validation. The easiest way to do this right now is to roundtrip the message through a supported protocol, for example:
def from_builtins(obj: Any, type: Type[T]) -> T:
    msg = msgspec.json.encode(obj)
    return msgspec.json.decode(msg, type=type)

Proposed initial API:

def to_builtins(obj: Any, *, recurse=True) -> Any:
    """Convert obj to simple builtin types (list, dict, ...). If `recurse` is True, this is applied recursively.
    Note that copying only happens when necessary, if a list of integers is passed in the same list
    will be returned."""
    ...

def from_builtins(obj: Any, type: Type[T]) -> T:
    """Convert obj to type. Note that copying only happens when necessary for conversion."""
    ...

I'm not 100% happy with this names, but all other options I could think of I liked less:

  • to_simple/from_simple
  • simplify/convert (I don't like the asymmetry in name here, and convert implies some casting behaviors like float -> int that we won't actually support)
  • cattrs calls these destructure/structure respectively
  • lower/lift (for converting to lower/higher level types)

A few open questions:

  • What types are valid to return from to_builtins? My initial reaction is dict, list, tuple, str, int, float, bool, None (and no subclasses).
  • How do we handle types that encode/decode differently between JSON and msgpack? These would be bytes/bytearray and datetime currently (we wouldn't support Ext/Raw in this API anyway)? Perhaps convert_binary=True and convert_datetime=True flags in to_builtins?

If anyone has thoughts on how to spell these APIs, please let me know below.

jcrist avatar Sep 18 '22 20:09 jcrist