pydantic
pydantic copied to clipboard
pydantic v2 memory allocation
Initial Checks
- [X] I confirm that I'm using Pydantic V2
Description
Hello all! First I want to thank you for your contribution to Python by publishing this library! We currently migrated to Pydantic V2 and experiencing memory issues. I'm not really familiar with memory profiling in Python so excuse me if I have some ignorant questions.
I tried to pinpoint exactly what causes the memory issues and I think it might coming from Pydantic. I created a very simple example where I just create a BaseModel and I get these results using memray.
pydantic 1.10.11:
🥇 Top 5 largest allocating locations (by size):
- _call_with_frames_removed:
pydantic 2.5.3
🥇 Top 5 largest allocating locations (by size):
- validate_core_schema:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:585 -> 17.156MB
- init:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 8.077MB
- _create_fn:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/dataclasses.py:433 -> 7.064MB
- get_data:
Is there a reason why the newer versions allocate 17.156MB ?
Example Code
from pydantic import BaseModel
class Model(BaseModel):
name: str
surname: str
age: int
m = Model(
name="Antonis",
surname="Agapiou",
age=27
)
Python, Pydantic & OS Version
pydantic-core version: 2.14.6
pydantic-core build: profile=release pgo=true
install path: /Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic
python version: 3.11.1 (main, Jan 9 2023, 18:35:12) [Clang 14.0.0 (clang-1400.0.29.202)]
platform: macOS-14.2.1-arm64-arm-64bit
related packages: typing_extensions-4.9.0 pyright-1.1.348 mypy-0.991 fastapi-0.109.0 pydantic-settings-2.1.0
Memory usage will probably be significantly reduced by https://github.com/pydantic/pydantic-core/pull/1085.
More detail to come.
@samuelcolvin thanks for the info! Please, is there any ETA for release? We notice the same issue while upgrading our codebase 🙏
@matejkloska the problem is that we need a beta release of PyO3 to build from so that pydantic-core can be packaged by linux distros that build it from source/cargo. @davidhewitt is working hard to get that over the line but since PyO3 is maintained by volunteers, it's hard to put a hard deadline on it. We had originally hoped it would be released in December, then Jan, but I think we're pretty confident of a release in February.
We think the 17MB of memory allocated in _core_utils.py
is probably related to the first invocation of Rust code (where Rust has it's own heap), rather than any significant usage in that location.
Is the 17MB of memory usage a problem for you, or are you just looking at the relative usage? Either way, maybe you could provide some more details of your application and we might be able to suggest another fix. That said, I think it's fair to say that Pydantic V2 does have a bigger memory footprint that V1, hopefully we can change that in future.
Hey @samuelcolvin, thanks for your answering so quick. Memory issues are no longer that much big of a deal as we made some refactors to the code (like decreasing the calls to the .model_json_schema()). It still seems that Pydantic V2 has a higher memory footprint as you said from V1. I did some experimenting using a small subset of our code just to ensure that it does. This is the code I was experimenting with:
import asyncio
import typing
import pydantic
class GenericDataItem(pydantic.BaseModel):
@classmethod
def numeric_fields(cls) -> frozenset[str]:
return cls._build_numeric_fields()
@classmethod
def _build_numeric_fields(cls) -> frozenset[str]:
schema = cls.model_json_schema()["properties"]
numeric_fields = set()
for field, attributes in schema.items():
if "type" in attributes and attributes["type"] in ("number", "integer"):
numeric_fields.add(field)
elif "anyOf" in attributes:
for subfield in attributes["anyOf"]:
if "type" in subfield and subfield["type"] in ("number", "integer"):
numeric_fields.add(field)
return frozenset(numeric_fields)
class TestModel1(GenericDataItem):
numeric_field_1: int | None = None
@pydantic.field_validator("numeric_field_1", mode="before")
@classmethod
def numeric_field_1_to_str(cls, numeric_field_1: str) -> int | None:
if isinstance(numeric_field_1, str):
return int(numeric_field_1)
class TestModel2(TestModel1):
str_field_1: str | None = None
async def _load_test_model_1s() -> typing.AsyncGenerator[TestModel1, None]:
for i in range(1000):
yield TestModel1(numeric_field_1=str(i))
async def _load_test_model_2(
test_model_1s: typing.AsyncGenerator[TestModel1, None],
) -> typing.AsyncGenerator[TestModel2, None]:
async for test_model_1 in test_model_1s:
yield TestModel2(str_field_1="string", **test_model_1.model_dump())
async def main() -> None:
test_model_1s = _load_test_model_1s()
test_model_2s = _load_test_model_2(test_model_1s)
async for test_model_2 in test_model_2s:
_ = test_model_2.numeric_fields()
if __name__ == "__main__":
asyncio.run(main())
The two versions I compared are 2.5.3
and 1.10.11
with the following changes to the code to be compatible with each version:
-
.model_json_schema()
->schema()
-
.model_dump()
->dict()
-
pydantic.field_validator(mode='before')
->pydantic.validator(pre='True')
These are my results using memray:
memray stats
V1
📏 **Total allocations:**
1673
📦 **Total memory allocated:**
3.361MB
📊 **Histogram of allocation size:**
min: 1.000B
-------------------------------------------
< 4.000B : 101 ▇▇▇
< 15.000B : 0
< 63.000B : 7 ▇
< 255.000B : 91 ▇▇▇
< 1.000KB : 1088 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 272 ▇▇▇▇▇▇▇
< 15.999KB : 108 ▇▇▇
< 63.999KB : 3 ▇
< 255.999KB: 2 ▇
<=1.000MB : 1 ▇
-------------------------------------------
max: 1.000MB
📂 **Allocator type distribution:**
MALLOC: 955
CALLOC: 551
REALLOC: 166
MMAP: 1
🥇 **Top** **5** **largest allocating locations (by size):**
- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 1.101MB
- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 1.007MB
- __init__:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 451.841KB
- get_data:<frozen importlib._bootstrap_external>:1131 -> 245.574KB
- _get_code_from_file:<frozen runpy>:259 -> 214.746KB
🥇 **Top** **5** **largest allocating locations (by number of allocations):**
- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 835
- __init__:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 185
- _get_code_from_file:<frozen runpy>:259 -> 119
- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 103
- _parse_sub:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/re/_parser.py:455 -> 63
V2
📏 **Total allocations:**
50094
📦 **Total memory allocated:**
798.913MB
📊 **Histogram of allocation size:**
min: 1.000B
--------------------------------------------
< 4.000B : 6168 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.000B : 9015 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 63.000B : 3728 ▇▇▇▇▇▇▇▇▇
< 255.000B : 9175 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.000KB : 10504 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 6127 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.999KB : 2261 ▇▇▇▇▇▇
< 63.999KB : 59 ▇
< 255.999KB: 38 ▇
<=1.000MB : 3019 ▇▇▇▇▇▇▇▇
--------------------------------------------
max: 1.000MB
📂 **Allocator type distribution:**
MALLOC: 43856
REALLOC: 3172
CALLOC: 3061
MMAP: 5
🥇 **Top** **5** **largest allocating locations (by size):**
- __init__:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/main.py:164 -> 500.435MB
- model_dump:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/main.py:308 -> 250.000MB
- validate_core_schema:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:585 -> 18.683MB
- __init__:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 7.077MB
- _create_fn:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/dataclasses.py:433 -> 6.920MB
memray flamegraph
(they're uploaded as .txt files, you can convert them to .html to see them)
V1
memray-flamegraph-profile_test.py.86248.txt
V2
memray-flamegraph-profile_test.py.86621.txt
As you can see from flamegraphs the memory footprint is a lot higher in V2:
V1
V2
V1 Peak memory usage: 1.5 MiB V2 Peak memory usage: 14.4 MiB
V1 Total number of allocations: 2853 V2 Total number of allocations: 121961
Again thanks for your quick response and for all the hard work creating this library. Hopefully with the release of PyO3 most of these issues will be solved :D !
Thanks @AntonisAgap for the nice repro.
So already it looks like 2.6.0 might be slightly improved, this is what I see running with pydantic-core
main branch:
📏 Total allocations:
50663
📦 Total memory allocated:
447.702MB
📊 Histogram of allocation size:
min: 1.000B
-------------------------------------------
< 4.000B : 7154 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.000B : 7977 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 63.000B : 3906 ▇▇▇▇▇▇▇▇▇▇▇
< 255.000B : 7754 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.000KB : 9561 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 8146 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.999KB : 2176 ▇▇▇▇▇▇
< 63.999KB : 925 ▇▇▇
< 255.999KB: 3049 ▇▇▇▇▇▇▇▇
<=1.000MB : 15 ▇
-------------------------------------------
max: 1.000MB
📂 Allocator type distribution:
MALLOC: 41741
CALLOC: 5753
REALLOC: 3163
MMAP: 6
🥇 Top 5 largest allocating locations (by size):
- __init__:/home/david/dev/pydantic/pydantic/pydantic/main.py:171 -> 250.435MB
- model_dump:/home/david/dev/pydantic/pydantic/pydantic/main.py:314 -> 125.000MB
- __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 19.140MB
- validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18.231MB
- _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 14.865MB
🥇 Top 5 largest allocating locations (by number of allocations):
- validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18739
- sub:/home/david/.pyenv/versions/3.12.0/lib/python3.12/re/__init__.py:186 -> 6333
- _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 6244
- __init__:/home/david/dev/pydantic/pydantic/pydantic/main.py:171 -> 6000
- __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 4730
and with the new PyO3, i.e. running the branch in https://github.com/pydantic/pydantic-core/pull/1085
📏 Total allocations:
57628
📦 Total memory allocated:
107.216MB
📊 Histogram of allocation size:
min: 1.000B
--------------------------------------------
< 4.000B : 7816 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.000B : 7997 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 63.000B : 3932 ▇▇▇▇▇▇▇
< 255.000B : 7765 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.000KB : 14656 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 10202 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.999KB : 4002 ▇▇▇▇▇▇▇
< 63.999KB : 1144 ▇▇
< 255.999KB: 98 ▇
<=1.000MB : 16 ▇
--------------------------------------------
max: 1.000MB
📂 Allocator type distribution:
MALLOC: 41910
CALLOC: 11090
REALLOC: 4621
MMAP: 7
🥇 Top 5 largest allocating locations (by size):
- _call_with_frames_removed:<frozen importlib._bootstrap>:488 -> 35.697MB
- __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 19.140MB
- validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18.608MB
- _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 13.865MB
- get_data:<frozen importlib._bootstrap_external>:1186 -> 5.113MB
🥇 Top 5 largest allocating locations (by number of allocations):
- validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18749
- _call_with_frames_removed:<frozen importlib._bootstrap>:488 -> 10456
- sub:/home/david/.pyenv/versions/3.12.0/lib/python3.12/re/__init__.py:186 -> 6333
- _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 6243
- __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 4730
The number of allocations hasn't gone down significantly, which might be interesting to investigate sometime, but the big win to come is in MBs allocated; 448 -> 107, so that's cutting the memory usage by something like a factor of 4.
The even better news is that once we've cut that big chunk away then I'm sure it'll be easier for us to start working through to identify and eliminate other sources of allocations.
So in summary, it looks like Pydantic 2.7 might be significantly better here and who knows where we can get to for 2.8 and beyond 🚀
This is amazing 🤩, I'm really looking forward to 2.7.
That's awesome! Looking forward to 2.7 as well! Thanks for the info and the hard work 😄!
we're definitely looking forward to this as well !
Now that the PyO3 work is done, retesting on pydantic main
I get the following:
📏 Total allocations:
52488
📦 Total memory allocated:
61.289MB
📊 Histogram of allocation size:
min: 1.000B
--------------------------------------------
< 4.000B : 6174 ▇▇▇▇▇▇▇▇▇▇▇▇
< 15.000B : 8024 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 63.000B : 3559 ▇▇▇▇▇▇▇
< 255.000B : 10153 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.000KB : 13524 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 8595 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 15.999KB : 2300 ▇▇▇▇▇
< 63.999KB : 108 ▇
< 255.999KB: 37 ▇
<=1.000MB : 14 ▇
--------------------------------------------
max: 1.000MB
📂 Allocator type distribution:
MALLOC: 45734
CALLOC: 3447
REALLOC: 3300
MMAP: 7
🥇 Top 5 largest allocating locations (by size):
- validate_core_schema:/Users/david/Dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:568 -> 18.260MB
- __init__:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/typing.py:864 -> 8.184MB
- _create_fn:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dataclasses.py:433 -> 6.737MB
- _get_schema:/Users/david/Dev/pydantic/pydantic/pydantic/type_adapter.py:78 -> 6.226MB
- get_data:<frozen importlib._bootstrap_external>:1131 -> 4.836MB
🥇 Top 5 largest allocating locations (by number of allocations):
- validate_core_schema:/Users/david/Dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:568 -> 22812
- sub:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py:185 -> 6326
- _create_fn:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dataclasses.py:433 -> 4124
- __init__:/Users/david/Dev/pydantic/pydantic/pydantic/main.py:175 -> 4000
- __init__:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/typing.py:864 -> 3016
Testing on macOS instead of linux, so the comparison may not be directly applicable. But it does look like we've made significant progress at reducing allocations and it'll be interesting to see if we can get further on this 🚀
That is great news, awesome work!
@davidhewitt I installed 2.7.0b1, to run some memray test with just importing a ~5000 line generated complex model file.
Compared to 2.6 it's better, but comparing to v1 it's really slow to import and It's using much more memory during runtime. We actually still get the same memory usage as it was with 2.6 during runtime.
Import times:
- v1:
0.34s
- 2.6.3:
1.13s
- 2.7.0b1:
1.06s
Memory RES usage reported by OS (after substracting python base before import):
- v1:
15.1MB
- 2.6.3:
98.6MB
- 2.7.0b1:
98.6MB
I think the problem is related to the extensive use of submodels, because just commenting out the top model (6 lines) reduces the memory use by 10MB.
Does this mean each of the models are allocated separately, even if they are part of another model? For example if we have a specific model that is used in 10 other models, and those are also used in 10 other, it would be allocated 100+1 times?
== 2.6.3 v2
📏 Total allocations:
684570
📦 Total memory allocated:
987.497MB
📊 Histogram of allocation size:
min: 1.000B
---------------------------------------------
< 4.000B : 19581 ▇▇▇
< 15.000B : 87275 ▇▇▇▇▇▇▇▇▇▇▇▇
< 63.000B : 189091 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 255.000B : 174132 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.000KB : 133436 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 61575 ▇▇▇▇▇▇▇▇▇
< 15.999KB : 15152 ▇▇▇
< 63.999KB : 2722 ▇
< 255.999KB: 949 ▇
<=1.000MB : 657 ▇
---------------------------------------------
max: 1.000MB
📂 Allocator type distribution:
MALLOC: 567877
REALLOC: 83122
CALLOC: 25569
MMAP: 7322
POSIX_MEMALIGN: 680
🥇 Top 5 largest allocating locations (by size):
- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:575 -> 270.933MB
- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 230.421MB
- complete_model_class:/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:544 -> 176.144MB
- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 79.741MB
- _walk:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:207 -> 45.688MB
🥇 Top 5 largest allocating locations (by number of allocations):
- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 417872
- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:575 -> 137497
- complete_model_class:/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:544 -> 61899
- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 19239
- _get_code_from_file:<frozen runpy>:259 -> 7938
== 2.7.0b1 v2
📏 Total allocations:
682251
📦 Total memory allocated:
464.631MB
📊 Histogram of allocation size:
min: 1.000B
---------------------------------------------
< 4.000B : 19957 ▇▇▇
< 15.000B : 87660 ▇▇▇▇▇▇▇▇▇▇▇▇
< 63.000B : 188837 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 255.000B : 173695 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.000KB : 134558 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 3.999KB : 61348 ▇▇▇▇▇▇▇▇▇
< 15.999KB : 13335 ▇▇
< 63.999KB : 2733 ▇
< 255.999KB: 61 ▇
<=1.000MB : 67 ▇
---------------------------------------------
max: 1.000MB
📂 Allocator type distribution:
MALLOC: 568150
REALLOC: 82313
CALLOC: 25625
MMAP: 5483
POSIX_MEMALIGN: 680
🥇 Top 5 largest allocating locations (by size):
- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:568 -> 110.987MB
- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 78.195MB
- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 68.596MB
- _get_code_from_file:<frozen runpy>:259 -> 29.143MB
- _handle_other_schemas:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:209 -> 26.000MB
🥇 Top 5 largest allocating locations (by number of allocations):
- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 418292
- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:568 -> 133966
- complete_model_class:/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:566 -> 61177
- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 19349
- _get_code_from_file:<frozen runpy>:259 -> 7938
== v1 ==
📏 Total allocations:
32328
📦 Total memory allocated:
100.179MB
📊 Histogram of allocation size:
min: 1.000B
--------------------------------------------
< 3.000B : 4616 ▇▇▇▇▇▇▇▇▇▇
< 14.000B : 0
< 55.000B : 100 ▇
< 209.000B : 9 ▇
< 795.000B : 12645 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 2.954KB : 4671 ▇▇▇▇▇▇▇▇▇▇
< 11.234KB : 8207 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 42.726KB : 2027 ▇▇▇▇▇
< 162.487KB: 32 ▇
<=617.938KB: 21 ▇
--------------------------------------------
max: 617.938KB
📂 Allocator type distribution:
MALLOC: 20840
CALLOC: 9776
REALLOC: 1712
🥇 Top 5 largest allocating locations (by size):
- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 57.338MB
- _get_code_from_file:<frozen runpy>:259 -> 29.017MB
- update_model_forward_refs:/lib/python3.12/site-packages/pydantic/v1/typing.py:546 -> 3.508MB
- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:301 -> 1.215MB
- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:324 -> 1.215MB
🥇 Top 5 largest allocating locations (by number of allocations):
- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 14195
- _get_code_from_file:<frozen runpy>:259 -> 7925
- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:301 -> 1659
- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:324 -> 1659
- __new__:<frozen abc>:106 -> 991
Maybe a good example for optimization:
# generate V1 models:
datamodel-codegen --url https://github.com/kubernetes/kubernetes/raw/master/api/openapi-spec/swagger.json > k8s_v1.py
# generate V2 models:
datamodel-codegen --url https://github.com/kubernetes/kubernetes/raw/master/api/openapi-spec/swagger.json --output-model-type=pydantic_v2.BaseModel > k8s_v2.py
I did the following in 2 separate processes:
> import k8s_v1
> import k8s_v2
(also note the CPU time diff)
Are there any more big wins here? The memory usage and CPU usage is still preventing us from upgrading our app from Pydantic v1 -> v2.
I just want to chip inn and confirm memory usage increased when i upgraded from Pydantic 1.x to 2.7.x. I forgot to note down my exact test numbers. But my google app engine instance went from something like 350mb -> 420mb memory used on first page load. I still decided to go for the upgrade, but it is a bit worrying.
Especially worrying since reloading the page several times jumps memory usage up as well. Did for both 1.x to 2.7.x so i cant confirm it is a memory leech in pydantic 2.7, or that its not gc eventually. Or I might have a memory leech in other parts of my code. Whatever the source of this, it gets worse when the baseline memory use increase.
I did some basic profiling over huge RAM allocation by pydantic 2 here: https://github.com/pylakey/aiotdlib/issues/135
@truenicoco
I did some basic profiling over huge RAM allocation by pydantic 2 here: pylakey/aiotdlib#135
I don't think they are looking at this issue anymore (no new replies to any of our feedback since March) , maybe it worth creating a new ticket, because it is still very well a big issue for us too.
datamodel-codegen --url https://github.com/kubernetes/kubernetes/raw/master/api/openapi-spec/swagger.json --output-model-type=pydantic_v2.BaseModel > k8s_v2.py
This is a significant issue I've noticed. I have seen services with fairly large API schema documents (20-30k line OpenAPI files) running on AWS Lambda that use generated pydantic models, generated from datamodel-codegen. The upgrade from Pydantic v1 to v2 increased the lambda cold-start time from 2-3 seconds to 20-30 seconds, just like your CPU time shows when benchmarking the kubernetes API models. The memory leak improvements have been great, but this is still a major problem that really affects ephemeral compute models like Lambda, where new execution environments are spun up on the fly and things aren't cached, so imports have to be reloaded.
I profiled the kubernetes import k8s_v2
statement below
These functions seem to be the major bottlenecks in terms of time
https://github.com/pydantic/pydantic/blob/d654a0766c2f3c6fe0a12718f32aa3bf4d3ecc86/pydantic/_internal/_model_construction.py#L661
https://github.com/pydantic/pydantic/blob/d654a0766c2f3c6fe0a12718f32aa3bf4d3ecc86/pydantic/_internal/_model_construction.py#L681
Working on this now, hoping to roll out a fix in v2.9 at the end of the month!
Hey folks - I've addressed this via https://github.com/pydantic/pydantic/pull/10113 as well. Let me know what sort of improvements you get with local testing!
Amazing work @sydney-runkle, thank you so much.