Allow serialization of numpy nd array
Initial Checks
- [X] I have searched Google & GitHub for similar requests and couldn't find anything
- [X] I have read and followed the docs and still think this feature is missing
Description
Context
Hey all
I am creating this issue after opening a discussion here.
I want to be able to serialize / deserialize numpy nd array objects inside a pydantic model. I used to be able to do it in v1 but since most of the json code has been ported pydantic-core I cannot use json_encoders anymore.
Therefore I was wondering if pydantic (or pydantic-core) could directly serialize numpy nd array ? Like orjson basically. I would be happy to draft a PR for this
If this is not in the scope of pydantic, how can I achieve this ?
thanks in advance
Affected Components
- [ ] Compatibility between releases
- [ ] Data validation/parsing
- [X] Data serialization -
.model_dump()and.model_dump_json() - [ ] JSON Schema
- [ ] Dataclasses
- [ ] Model Config
- [ ] Field Types - adding or changing a particular data type
- [ ] Function validation decorator
- [ ] Generic Models
- [ ] Other Model behaviour -
model_construct(), pickling, private attributes, ORM mode - [ ] Plugins and integration with other tools - mypy, FastAPI, python-devtools, Hypothesis, VS Code, PyCharm, etc.
Selected Assignee: @hramezani
Hey @samsja,
One workaround would be to have your custom NdArray type like this:
from typing_extensions import Annotated
import numpy as np
from pydantic import BaseModel, BeforeValidator, ConfigDict, PlainSerializer
arr = np.array(
[
[1, 5, 6],
[4, 7, 2],
[3, 1, 9]
]
)
def nd_array_custom_before_validator(x):
# custome before validation logic
return x
def nd_array_custom_serializer(x):
# custome serialization logic
return str(x)
NdArray = Annotated[
np.ndarray,
BeforeValidator(nd_array_custom_before_validator),
PlainSerializer(nd_array_custom_serializer, return_type=str),
]
class Model(BaseModel):
x: NdArray
model_config = ConfigDict(arbitrary_types_allowed=True)
m = Model(x=arr)
print(m.model_dump())
You may find more useful information in Serialization doc
Okay I totally missed this part of the documentation about custom serialization. This would def'ly solve my problem. Thanks for the quick response.
There's gotta be a better way to do this without so much workaround, especially since np.ndarray is so common in the Python ecosystem.
Agree with @ibehnam. Native support for numpy arrays would really be a nice feature.
I need to create about a dozen BaseModels supporting NumPy arrays and it seems that there exist (at least) two new packages designed for this purpose:
Has anybody used either of them in practice? I would be very much interested in hearing your thoughts on most convenient approaches to (de)serialization of models containing NumPy arrays :slightly_smiling_face:
I need to create about a dozen
BaseModels supporting NumPy arrays and it seems that there exist (at least) two new packages designed for this purpose:Has anybody used either of them in practice? I would be very much interested in hearing your thoughts on most convenient approaches to (de)serialization of models containing NumPy arrays 🙂
I think that the easiest solution is to implement what @hramezani proposed, maybe using https://github.com/ijl/orjson for performant array serialization. I quickly looked at both libraries and they look great, but seems that their scope is way beyond "numpy serialization with pydantic"
Hey @samsja,
One workaround would be to have your custom
NdArraytype like this:from typing_extensions import Annotated import numpy as np from pydantic import BaseModel, BeforeValidator, ConfigDict, PlainSerializer arr = np.array( [ [1, 5, 6], [4, 7, 2], [3, 1, 9] ] ) def nd_array_custom_before_validator(x): # custome before validation logic return x def nd_array_custom_serializer(x): # custome serialization logic return str(x) NdArray = Annotated[ np.ndarray, BeforeValidator(nd_array_custom_before_validator), PlainSerializer(nd_array_custom_serializer, return_type=str), ] class Model(BaseModel): x: NdArray model_config = ConfigDict(arbitrary_types_allowed=True) m = Model(x=arr) print(m.model_dump())You may find more useful information in Serialization doc
I am reponing this ticket as it seems that many ppl want to see a native pydantic implementation of tensor serialization.
FYI: @sydney-runkle @samuelcolvin
@samsja Thank you very much for your answer!
I would also like this !
Hi! Do we have any roadmap now?
@Roy-Kid not really for now. We still need to figure out whether we want better support for custom types so that this can be properly supported in a third party library or if we want to add builtin support in pydantic-core.
Would love to see official support for numpy arrays within pydantic indeed, that's one important data type missing in pydantic. numpydantic seems to work pretty nicely, with JSON schema getting out beautifully.
Could it be an experimental type or something that gets added ?
On Tue, Jul 8, 2025 at 7:02 PM Julien Stanguennec @.***> wrote:
JulienStanguennec left a comment (pydantic/pydantic#7017) https://github.com/pydantic/pydantic/issues/7017#issuecomment-3050510028
Would love to see official support for numpy arrays within pydantic indeed, that's one important data type missing in pydantic. numpydantic seems to work pretty nicely, with JSON schema getting out beautifully.
— Reply to this email directly, view it on GitHub https://github.com/pydantic/pydantic/issues/7017#issuecomment-3050510028, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAKJJDNPEEWJ4L44ALNOAL3HREZTAVCNFSM6AAAAABSMCH7A2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANJQGUYTAMBSHA . You are receiving this because you commented.Message ID: @.***>
NdArray = Annotated[ np.ndarray, BeforeValidator(nd_array_custom_before_validator), PlainSerializer(nd_array_custom_serializer, return_type=str), ]
This is great for if you want to serialize existing classes, however in my case I am subclassing np.ndarray. Is there a way I can add PlainSerializer and BeforeValidator to my np.ndarray subclass so that it can be serialized from any BaseModel it is used from (or BaseSettings in my case)?
EDIT: Actually managed to find the solution shortly after posing it. Sharing it here in case anyone stumbles upon the same issue:
class Vec3(np.ndarray[float]):
@classmethod
def __get_pydantic_core_schema__(cls, _source_type: Any, _handler) -> core_schema.CoreSchema:
def serialize(v: Vec3) -> list[float]:
return v.tolist()
return core_schema.no_info_after_validator_function(
cls._validate,
core_schema.list_schema(items_schema=core_schema.float_schema()),
serialization=core_schema.plain_serializer_function_ser_schema(
serialize,
return_schema=core_schema.list_schema(items_schema=core_schema.float_schema())
)
)
@classmethod
def _validate(cls, v):
if isinstance(v, cls):
return v
elif isinstance(v, (list, tuple, np.ndarray)) and len(v) == 3:
return cls(*v)
raise TypeError("Expected a list/tuple/ndarray of 3 floats for Vec3")
This (de)serializes a numpy array into a list of floats.
@Roy-Kid not really for now. We still need to figure out whether we want better support for custom types so that this can be properly supported in a third party library or if we want to add builtin support in pydantic-core.
As you would go crazy having all in PyDantic and updating it because the other packages get updated, I think you need a separate package per other package you have an interface with.
However, you need automated testing for that to ensure not to break things on updates.