pydantic icon indicating copy to clipboard operation
pydantic copied to clipboard

Allow serialization of numpy nd array

Open samsja opened this issue 2 years ago • 15 comments

Initial Checks

  • [X] I have searched Google & GitHub for similar requests and couldn't find anything
  • [X] I have read and followed the docs and still think this feature is missing

Description

Context

Hey all

I am creating this issue after opening a discussion here.

I want to be able to serialize / deserialize numpy nd array objects inside a pydantic model. I used to be able to do it in v1 but since most of the json code has been ported pydantic-core I cannot use json_encoders anymore.

Therefore I was wondering if pydantic (or pydantic-core) could directly serialize numpy nd array ? Like orjson basically. I would be happy to draft a PR for this

If this is not in the scope of pydantic, how can I achieve this ?

thanks in advance

Affected Components

Selected Assignee: @hramezani

samsja avatar Aug 07 '23 11:08 samsja

Hey @samsja,

One workaround would be to have your custom NdArray type like this:

from typing_extensions import Annotated

import numpy as np
from pydantic import BaseModel, BeforeValidator, ConfigDict, PlainSerializer


arr = np.array(
    [
        [1, 5, 6],
        [4, 7, 2],
        [3, 1, 9]
    ]
)

def nd_array_custom_before_validator(x):
    # custome before validation logic
    return x


def nd_array_custom_serializer(x):
    # custome serialization logic
    return str(x)

NdArray = Annotated[
    np.ndarray,
    BeforeValidator(nd_array_custom_before_validator),
    PlainSerializer(nd_array_custom_serializer, return_type=str),
]

class Model(BaseModel):
    x: NdArray

    model_config = ConfigDict(arbitrary_types_allowed=True)


m = Model(x=arr)
print(m.model_dump())

You may find more useful information in Serialization doc

hramezani avatar Aug 08 '23 19:08 hramezani

Okay I totally missed this part of the documentation about custom serialization. This would def'ly solve my problem. Thanks for the quick response.

samsja avatar Aug 09 '23 08:08 samsja

There's gotta be a better way to do this without so much workaround, especially since np.ndarray is so common in the Python ecosystem.

ibehnam avatar Mar 20 '24 20:03 ibehnam

Agree with @ibehnam. Native support for numpy arrays would really be a nice feature.

ospikovets avatar Mar 30 '24 14:03 ospikovets

I need to create about a dozen BaseModels supporting NumPy arrays and it seems that there exist (at least) two new packages designed for this purpose:

Has anybody used either of them in practice? I would be very much interested in hearing your thoughts on most convenient approaches to (de)serialization of models containing NumPy arrays :slightly_smiling_face:

pawel-czyz avatar Nov 24 '24 10:11 pawel-czyz

I need to create about a dozen BaseModels supporting NumPy arrays and it seems that there exist (at least) two new packages designed for this purpose:

Has anybody used either of them in practice? I would be very much interested in hearing your thoughts on most convenient approaches to (de)serialization of models containing NumPy arrays 🙂

I think that the easiest solution is to implement what @hramezani proposed, maybe using https://github.com/ijl/orjson for performant array serialization. I quickly looked at both libraries and they look great, but seems that their scope is way beyond "numpy serialization with pydantic"

Hey @samsja,

One workaround would be to have your custom NdArray type like this:

from typing_extensions import Annotated

import numpy as np
from pydantic import BaseModel, BeforeValidator, ConfigDict, PlainSerializer


arr = np.array(
    [
        [1, 5, 6],
        [4, 7, 2],
        [3, 1, 9]
    ]
)

def nd_array_custom_before_validator(x):
    # custome before validation logic
    return x


def nd_array_custom_serializer(x):
    # custome serialization logic
    return str(x)

NdArray = Annotated[
    np.ndarray,
    BeforeValidator(nd_array_custom_before_validator),
    PlainSerializer(nd_array_custom_serializer, return_type=str),
]

class Model(BaseModel):
    x: NdArray

    model_config = ConfigDict(arbitrary_types_allowed=True)


m = Model(x=arr)
print(m.model_dump())

You may find more useful information in Serialization doc

samsja avatar Nov 25 '24 15:11 samsja

I am reponing this ticket as it seems that many ppl want to see a native pydantic implementation of tensor serialization.

FYI: @sydney-runkle @samuelcolvin

samsja avatar Nov 25 '24 15:11 samsja

@samsja Thank you very much for your answer!

pawel-czyz avatar Nov 26 '24 09:11 pawel-czyz

I would also like this !

mg3146 avatar Jun 16 '25 19:06 mg3146

Hi! Do we have any roadmap now?

Roy-Kid avatar Jun 28 '25 10:06 Roy-Kid

@Roy-Kid not really for now. We still need to figure out whether we want better support for custom types so that this can be properly supported in a third party library or if we want to add builtin support in pydantic-core.

Viicos avatar Jun 29 '25 11:06 Viicos

Would love to see official support for numpy arrays within pydantic indeed, that's one important data type missing in pydantic. numpydantic seems to work pretty nicely, with JSON schema getting out beautifully.

JulienStanguennec avatar Jul 08 '25 23:07 JulienStanguennec

Could it be an experimental type or something that gets added ?

On Tue, Jul 8, 2025 at 7:02 PM Julien Stanguennec @.***> wrote:

JulienStanguennec left a comment (pydantic/pydantic#7017) https://github.com/pydantic/pydantic/issues/7017#issuecomment-3050510028

Would love to see official support for numpy arrays within pydantic indeed, that's one important data type missing in pydantic. numpydantic seems to work pretty nicely, with JSON schema getting out beautifully.

— Reply to this email directly, view it on GitHub https://github.com/pydantic/pydantic/issues/7017#issuecomment-3050510028, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAKJJDNPEEWJ4L44ALNOAL3HREZTAVCNFSM6AAAAABSMCH7A2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJQGUYTAMBSHA . You are receiving this because you commented.Message ID: @.***>

mg3146 avatar Jul 09 '25 01:07 mg3146

NdArray = Annotated[ np.ndarray, BeforeValidator(nd_array_custom_before_validator), PlainSerializer(nd_array_custom_serializer, return_type=str), ]

This is great for if you want to serialize existing classes, however in my case I am subclassing np.ndarray. Is there a way I can add PlainSerializer and BeforeValidator to my np.ndarray subclass so that it can be serialized from any BaseModel it is used from (or BaseSettings in my case)?

EDIT: Actually managed to find the solution shortly after posing it. Sharing it here in case anyone stumbles upon the same issue:

class Vec3(np.ndarray[float]):
    @classmethod
    def __get_pydantic_core_schema__(cls, _source_type: Any, _handler) -> core_schema.CoreSchema:
        def serialize(v: Vec3) -> list[float]:
            return v.tolist()

        return core_schema.no_info_after_validator_function(
            cls._validate,
            core_schema.list_schema(items_schema=core_schema.float_schema()),
            serialization=core_schema.plain_serializer_function_ser_schema(
                serialize,
                return_schema=core_schema.list_schema(items_schema=core_schema.float_schema())
            )
        )

    @classmethod
    def _validate(cls, v):
        if isinstance(v, cls):
            return v
        elif isinstance(v, (list, tuple, np.ndarray)) and len(v) == 3:
            return cls(*v)
        raise TypeError("Expected a list/tuple/ndarray of 3 floats for Vec3")

This (de)serializes a numpy array into a list of floats.

ISOR3X avatar Jul 30 '25 20:07 ISOR3X

@Roy-Kid not really for now. We still need to figure out whether we want better support for custom types so that this can be properly supported in a third party library or if we want to add builtin support in pydantic-core.

As you would go crazy having all in PyDantic and updating it because the other packages get updated, I think you need a separate package per other package you have an interface with.

However, you need automated testing for that to ensure not to break things on updates.

torsknod2 avatar Oct 24 '25 20:10 torsknod2