Built-in support for numpy data-types
Description
Hi,
Would there be any interest in adding support for numpy datatypes? Currently, attempting to encode these fails:
import msgspec.json as mjson
import numpy as np
mjson.encode(np.float64(1.0))
Traceback (most recent call last):
File "<input>", line 4, in <module>
mjson.encode(np.float64(1.0))
TypeError: Encoding objects of type numpy.float64 is unsupported
This seems like quite a common use case given the popularity of numpy, and to me it seems there are fairly obvious target data JSON types in most situations.
OTOH I appreciate that this can be done via extensions and that trying to support every third party library under the sun is like painting the Forth Bridge.
That'd be great, torch would be cool too but I think it should at least be less expensive to load directly to numpy and convert to torch than to go from list to torch.
I need the support for numpy too 👍
There are lots of numpy datatypes and ndarray shapes, I wouldn't be surprised if native support is a hassle.
I keep track of metadata like shape and dtype in the Struct, put the data in a bytes field, then define a to_numpy method that calls np.frombuffer.
would really appreciate support fot numpy, any update on this? let me know if there is anything I can do to make this happen @jcrist , I would be glad to help.
I've done the following for numpy. Looking into solutions for torch that supports shared Tensors. UntypedStorage seems promising.
from typing import Any
import msgspec
import numpy as np
__all__ = ["WireNDArray", "WireScalar"]
class WireNDArray(msgspec.Struct, array_like=True, kw_only=True):
dtype: str
shape: tuple[int, ...]
data: memoryview
@classmethod
def pack(cls, arr: np.ndarray):
return cls(data=arr.data, dtype=str(arr.dtype), shape=arr.shape)
def unpack(self) -> np.ndarray:
return np.frombuffer(self.data, dtype=self.dtype).reshape(self.shape)
class WireScalar(msgspec.Struct, array_like=True, kw_only=True):
dtype: str
data: memoryview
@classmethod
def pack(cls, arr: np.number):
return cls(data=arr.data, dtype=str(arr.dtype))
def unpack(self) -> np.ndarray[tuple[int, ...], np.dtype[Any]]:
return np.frombuffer(self.data, dtype=self.dtype)[0]
if __name__ == "__main__":
def __main__() -> None:
arr = np.random.rand(3, 2)
arr_cycle = WireNDArray.unpack(WireNDArray.pack(arr))
assert (arr == arr_cycle).all()
__main__()