msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Built-in support for numpy data-types

Open lunik1 opened this issue 1 year ago • 5 comments

Description

Hi,

Would there be any interest in adding support for numpy datatypes? Currently, attempting to encode these fails:

import msgspec.json as mjson
import numpy as np

mjson.encode(np.float64(1.0))
Traceback (most recent call last):
  File "<input>", line 4, in <module>
    mjson.encode(np.float64(1.0))
TypeError: Encoding objects of type numpy.float64 is unsupported

This seems like quite a common use case given the popularity of numpy, and to me it seems there are fairly obvious target data JSON types in most situations.

OTOH I appreciate that this can be done via extensions and that trying to support every third party library under the sun is like painting the Forth Bridge.

lunik1 avatar Sep 03 '24 14:09 lunik1

That'd be great, torch would be cool too but I think it should at least be less expensive to load directly to numpy and convert to torch than to go from list to torch.

umarbutler avatar Dec 24 '24 08:12 umarbutler

I need the support for numpy too 👍

heliar-k avatar Jan 03 '25 11:01 heliar-k

There are lots of numpy datatypes and ndarray shapes, I wouldn't be surprised if native support is a hassle.

I keep track of metadata like shape and dtype in the Struct, put the data in a bytes field, then define a to_numpy method that calls np.frombuffer.

makarr avatar Apr 25 '25 17:04 makarr

would really appreciate support fot numpy, any update on this? let me know if there is anything I can do to make this happen @jcrist , I would be glad to help.

raceychan avatar Jul 07 '25 05:07 raceychan

I've done the following for numpy. Looking into solutions for torch that supports shared Tensors. UntypedStorage seems promising.

from typing import Any

import msgspec
import numpy as np

__all__ = ["WireNDArray", "WireScalar"]

class WireNDArray(msgspec.Struct, array_like=True, kw_only=True):
    dtype: str
    shape: tuple[int, ...]
    data: memoryview

    @classmethod
    def pack(cls, arr: np.ndarray):
        return cls(data=arr.data, dtype=str(arr.dtype), shape=arr.shape)

    def unpack(self) -> np.ndarray:
        return np.frombuffer(self.data, dtype=self.dtype).reshape(self.shape)


class WireScalar(msgspec.Struct, array_like=True, kw_only=True):
    dtype: str
    data: memoryview

    @classmethod
    def pack(cls, arr: np.number):
        return cls(data=arr.data, dtype=str(arr.dtype))

    def unpack(self) -> np.ndarray[tuple[int, ...], np.dtype[Any]]:
        return np.frombuffer(self.data, dtype=self.dtype)[0]


if __name__ == "__main__":
    def __main__() -> None:
        arr = np.random.rand(3, 2)
        arr_cycle = WireNDArray.unpack(WireNDArray.pack(arr))
        assert (arr == arr_cycle).all()

    __main__()

btakita avatar Aug 27 '25 01:08 btakita