pydantic icon indicating copy to clipboard operation
pydantic copied to clipboard

pydantic v2 memory allocation

Open AntonisAgap opened this issue 1 year ago • 13 comments

Initial Checks

  • [X] I confirm that I'm using Pydantic V2

Description

Hello all! First I want to thank you for your contribution to Python by publishing this library! We currently migrated to Pydantic V2 and experiencing memory issues. I'm not really familiar with memory profiling in Python so excuse me if I have some ignorant questions.

I tried to pinpoint exactly what causes the memory issues and I think it might coming from Pydantic. I created a very simple example where I just create a BaseModel and I get these results using memray.

pydantic 1.10.11:

🥇 Top 5 largest allocating locations (by size): - _call_with_frames_removed::241 -> 2.008MB - namedtuple:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/collections/init.py:485 -> 1.001MB - init:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 451.841KB - get_data::1131 -> 245.574KB - _compile_bytecode::729 -> 103.437KB

pydantic 2.5.3

🥇 Top 5 largest allocating locations (by size): - validate_core_schema:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:585 -> 17.156MB - init:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 8.077MB - _create_fn:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/dataclasses.py:433 -> 7.064MB - get_data::1131 -> 3.072MB - _compile_bytecode::729 -> 2.551MB

Is there a reason why the newer versions allocate 17.156MB ?

Example Code

from pydantic import BaseModel

class Model(BaseModel):
    name: str
    surname: str
    age: int

m = Model(
    name="Antonis",
    surname="Agapiou",
    age=27
)

Python, Pydantic & OS Version

pydantic-core version: 2.14.6
          pydantic-core build: profile=release pgo=true
                 install path: /Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic
               python version: 3.11.1 (main, Jan  9 2023, 18:35:12) [Clang 14.0.0 (clang-1400.0.29.202)]
                     platform: macOS-14.2.1-arm64-arm-64bit
             related packages: typing_extensions-4.9.0 pyright-1.1.348 mypy-0.991 fastapi-0.109.0 pydantic-settings-2.1.0

AntonisAgap avatar Jan 27 '24 14:01 AntonisAgap

Memory usage will probably be significantly reduced by https://github.com/pydantic/pydantic-core/pull/1085.

More detail to come.

samuelcolvin avatar Jan 29 '24 14:01 samuelcolvin

@samuelcolvin thanks for the info! Please, is there any ETA for release? We notice the same issue while upgrading our codebase 🙏

matejkloska avatar Jan 29 '24 15:01 matejkloska

@matejkloska the problem is that we need a beta release of PyO3 to build from so that pydantic-core can be packaged by linux distros that build it from source/cargo. @davidhewitt is working hard to get that over the line but since PyO3 is maintained by volunteers, it's hard to put a hard deadline on it. We had originally hoped it would be released in December, then Jan, but I think we're pretty confident of a release in February.

We think the 17MB of memory allocated in _core_utils.py is probably related to the first invocation of Rust code (where Rust has it's own heap), rather than any significant usage in that location.

Is the 17MB of memory usage a problem for you, or are you just looking at the relative usage? Either way, maybe you could provide some more details of your application and we might be able to suggest another fix. That said, I think it's fair to say that Pydantic V2 does have a bigger memory footprint that V1, hopefully we can change that in future.

samuelcolvin avatar Jan 29 '24 16:01 samuelcolvin

Hey @samuelcolvin, thanks for your answering so quick. Memory issues are no longer that much big of a deal as we made some refactors to the code (like decreasing the calls to the .model_json_schema()). It still seems that Pydantic V2 has a higher memory footprint as you said from V1. I did some experimenting using a small subset of our code just to ensure that it does. This is the code I was experimenting with:

import asyncio
import typing

import pydantic


class GenericDataItem(pydantic.BaseModel):
    @classmethod
    def numeric_fields(cls) -> frozenset[str]:
        return cls._build_numeric_fields()

    @classmethod
    def _build_numeric_fields(cls) -> frozenset[str]:
        schema = cls.model_json_schema()["properties"]
        numeric_fields = set()
        for field, attributes in schema.items():
            if "type" in attributes and attributes["type"] in ("number", "integer"):
                numeric_fields.add(field)
            elif "anyOf" in attributes:
                for subfield in attributes["anyOf"]:
                    if "type" in subfield and subfield["type"] in ("number", "integer"):
                        numeric_fields.add(field)
        return frozenset(numeric_fields)


class TestModel1(GenericDataItem):
    numeric_field_1: int | None = None

    @pydantic.field_validator("numeric_field_1", mode="before")
    @classmethod
    def numeric_field_1_to_str(cls, numeric_field_1: str) -> int | None:
        if isinstance(numeric_field_1, str):
            return int(numeric_field_1)


class TestModel2(TestModel1):
    str_field_1: str | None = None


async def _load_test_model_1s() -> typing.AsyncGenerator[TestModel1, None]:
    for i in range(1000):
        yield TestModel1(numeric_field_1=str(i))


async def _load_test_model_2(
    test_model_1s: typing.AsyncGenerator[TestModel1, None],
) -> typing.AsyncGenerator[TestModel2, None]:
    async for test_model_1 in test_model_1s:
        yield TestModel2(str_field_1="string", **test_model_1.model_dump())


async def main() -> None:
    test_model_1s = _load_test_model_1s()
    test_model_2s = _load_test_model_2(test_model_1s)
    async for test_model_2 in test_model_2s:
        _ = test_model_2.numeric_fields()


if __name__ == "__main__":
    asyncio.run(main())

The two versions I compared are 2.5.3 and 1.10.11 with the following changes to the code to be compatible with each version:

  • .model_json_schema() -> schema()
  • .model_dump() -> dict()
  • pydantic.field_validator(mode='before') -> pydantic.validator(pre='True')

These are my results using memray:

memray stats

V1

📏 **Total allocations:**

1673

  

📦 **Total memory allocated:**

3.361MB

  

📊 **Histogram of allocation size:**

min: 1.000B

-------------------------------------------

< 4.000B   :  101 ▇▇▇

< 15.000B  :    0 

< 63.000B  :    7 ▇

< 255.000B :   91 ▇▇▇

< 1.000KB  : 1088 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

< 3.999KB  :  272 ▇▇▇▇▇▇▇

< 15.999KB :  108 ▇▇▇

< 63.999KB :    3 ▇

< 255.999KB:    2 ▇

<=1.000MB  :    1 ▇

-------------------------------------------

max: 1.000MB

  

📂 **Allocator type distribution:**

MALLOC: 955

CALLOC: 551

REALLOC: 166

MMAP: 1

  

🥇 **Top** **5** **largest allocating locations (by size):**

- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 1.101MB

- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 1.007MB

- __init__:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 451.841KB

- get_data:<frozen importlib._bootstrap_external>:1131 -> 245.574KB

- _get_code_from_file:<frozen runpy>:259 -> 214.746KB

  

🥇 **Top** **5** **largest allocating locations (by number of allocations):**

- _call_with_frames_removed:<frozen importlib._bootstrap>:241 -> 835

- __init__:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 185

- _get_code_from_file:<frozen runpy>:259 -> 119

- _compile_bytecode:<frozen importlib._bootstrap_external>:729 -> 103

- _parse_sub:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/re/_parser.py:455 -> 63

V2

📏 **Total allocations:**

50094

  

📦 **Total memory allocated:**

798.913MB

  

📊 **Histogram of allocation size:**

min: 1.000B

--------------------------------------------

< 4.000B   :  6168 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

< 15.000B  :  9015 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

< 63.000B  :  3728 ▇▇▇▇▇▇▇▇▇

< 255.000B :  9175 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

< 1.000KB  : 10504 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

< 3.999KB  :  6127 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

< 15.999KB :  2261 ▇▇▇▇▇▇

< 63.999KB :    59 ▇

< 255.999KB:    38 ▇

<=1.000MB  :  3019 ▇▇▇▇▇▇▇▇

--------------------------------------------

max: 1.000MB

  

📂 **Allocator type distribution:**

MALLOC: 43856

REALLOC: 3172

CALLOC: 3061

MMAP: 5

  

🥇 **Top** **5** **largest allocating locations (by size):**

- __init__:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/main.py:164 -> 500.435MB

- model_dump:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/main.py:308 -> 250.000MB

- validate_core_schema:/Users/antonis.agapiou/Library/Caches/pypoetry/virtualenvs/argos-FVnhZ34m-py3.11/lib/python3.11/site-packages/pydantic/_internal/_core_utils.py:585 -> 18.683MB

- __init__:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/typing.py:827 -> 7.077MB

- _create_fn:/Users/antonis.agapiou/.pyenv/versions/3.11.1/lib/python3.11/dataclasses.py:433 -> 6.920MB

memray flamegraph

(they're uploaded as .txt files, you can convert them to .html to see them)

V1

memray-flamegraph-profile_test.py.86248.txt

V2

memray-flamegraph-profile_test.py.86621.txt

As you can see from flamegraphs the memory footprint is a lot higher in V2:

V1

image

V2

image

V1 Peak memory usage: 1.5 MiB V2 Peak memory usage: 14.4 MiB

V1 Total number of allocations: 2853 V2 Total number of allocations: 121961

Again thanks for your quick response and for all the hard work creating this library. Hopefully with the release of PyO3 most of these issues will be solved :D !

AntonisAgap avatar Jan 31 '24 15:01 AntonisAgap

Thanks @AntonisAgap for the nice repro.

So already it looks like 2.6.0 might be slightly improved, this is what I see running with pydantic-core main branch:

📏 Total allocations:
        50663

📦 Total memory allocated:
        447.702MB

📊 Histogram of allocation size:
        min: 1.000B
        -------------------------------------------
        < 4.000B   : 7154 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 15.000B  : 7977 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 63.000B  : 3906 ▇▇▇▇▇▇▇▇▇▇▇
        < 255.000B : 7754 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 1.000KB  : 9561 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 3.999KB  : 8146 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 15.999KB : 2176 ▇▇▇▇▇▇
        < 63.999KB :  925 ▇▇▇
        < 255.999KB: 3049 ▇▇▇▇▇▇▇▇
        <=1.000MB  :   15 ▇
        -------------------------------------------
        max: 1.000MB

📂 Allocator type distribution:
         MALLOC: 41741
         CALLOC: 5753
         REALLOC: 3163
         MMAP: 6

🥇 Top 5 largest allocating locations (by size):
        - __init__:/home/david/dev/pydantic/pydantic/pydantic/main.py:171 -> 250.435MB
        - model_dump:/home/david/dev/pydantic/pydantic/pydantic/main.py:314 -> 125.000MB
        - __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 19.140MB
        - validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18.231MB
        - _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 14.865MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18739
        - sub:/home/david/.pyenv/versions/3.12.0/lib/python3.12/re/__init__.py:186 -> 6333
        - _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 6244
        - __init__:/home/david/dev/pydantic/pydantic/pydantic/main.py:171 -> 6000
        - __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 4730

and with the new PyO3, i.e. running the branch in https://github.com/pydantic/pydantic-core/pull/1085

📏 Total allocations:
        57628

📦 Total memory allocated:
        107.216MB

📊 Histogram of allocation size:
        min: 1.000B
        --------------------------------------------
        < 4.000B   :  7816 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 15.000B  :  7997 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 63.000B  :  3932 ▇▇▇▇▇▇▇
        < 255.000B :  7765 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 1.000KB  : 14656 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 3.999KB  : 10202 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 15.999KB :  4002 ▇▇▇▇▇▇▇
        < 63.999KB :  1144 ▇▇
        < 255.999KB:    98 ▇
        <=1.000MB  :    16 ▇
        --------------------------------------------
        max: 1.000MB

📂 Allocator type distribution:
         MALLOC: 41910
         CALLOC: 11090
         REALLOC: 4621
         MMAP: 7

🥇 Top 5 largest allocating locations (by size):
        - _call_with_frames_removed:<frozen importlib._bootstrap>:488 -> 35.697MB
        - __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 19.140MB
        - validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18.608MB
        - _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 13.865MB
        - get_data:<frozen importlib._bootstrap_external>:1186 -> 5.113MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - validate_core_schema:/home/david/dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:575 -> 18749
        - _call_with_frames_removed:<frozen importlib._bootstrap>:488 -> 10456
        - sub:/home/david/.pyenv/versions/3.12.0/lib/python3.12/re/__init__.py:186 -> 6333
        - _create_fn:/home/david/.pyenv/versions/3.12.0/lib/python3.12/dataclasses.py:473 -> 6243
        - __init__:/home/david/.pyenv/versions/3.12.0/lib/python3.12/typing.py:873 -> 4730

The number of allocations hasn't gone down significantly, which might be interesting to investigate sometime, but the big win to come is in MBs allocated; 448 -> 107, so that's cutting the memory usage by something like a factor of 4.

The even better news is that once we've cut that big chunk away then I'm sure it'll be easier for us to start working through to identify and eliminate other sources of allocations.

So in summary, it looks like Pydantic 2.7 might be significantly better here and who knows where we can get to for 2.8 and beyond 🚀

davidhewitt avatar Feb 01 '24 19:02 davidhewitt

This is amazing 🤩, I'm really looking forward to 2.7.

samuelcolvin avatar Feb 01 '24 19:02 samuelcolvin

That's awesome! Looking forward to 2.7 as well! Thanks for the info and the hard work 😄!

AntonisAgap avatar Feb 02 '24 06:02 AntonisAgap

we're definitely looking forward to this as well !

michaelgmiller1 avatar Feb 14 '24 19:02 michaelgmiller1

Now that the PyO3 work is done, retesting on pydantic main I get the following:

📏 Total allocations:
        52488

📦 Total memory allocated:
        61.289MB

📊 Histogram of allocation size:
        min: 1.000B
        --------------------------------------------
        < 4.000B   :  6174 ▇▇▇▇▇▇▇▇▇▇▇▇
        < 15.000B  :  8024 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 63.000B  :  3559 ▇▇▇▇▇▇▇
        < 255.000B : 10153 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 1.000KB  : 13524 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 3.999KB  :  8595 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 15.999KB :  2300 ▇▇▇▇▇
        < 63.999KB :   108 ▇
        < 255.999KB:    37 ▇
        <=1.000MB  :    14 ▇
        --------------------------------------------
        max: 1.000MB

📂 Allocator type distribution:
         MALLOC: 45734
         CALLOC: 3447
         REALLOC: 3300
         MMAP: 7

🥇 Top 5 largest allocating locations (by size):
        - validate_core_schema:/Users/david/Dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:568 -> 18.260MB
        - __init__:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/typing.py:864 -> 8.184MB
        - _create_fn:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dataclasses.py:433 -> 6.737MB
        - _get_schema:/Users/david/Dev/pydantic/pydantic/pydantic/type_adapter.py:78 -> 6.226MB
        - get_data:<frozen importlib._bootstrap_external>:1131 -> 4.836MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - validate_core_schema:/Users/david/Dev/pydantic/pydantic/pydantic/_internal/_core_utils.py:568 -> 22812
        - sub:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py:185 -> 6326
        - _create_fn:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dataclasses.py:433 -> 4124
        - __init__:/Users/david/Dev/pydantic/pydantic/pydantic/main.py:175 -> 4000
        - __init__:/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/typing.py:864 -> 3016

Testing on macOS instead of linux, so the comparison may not be directly applicable. But it does look like we've made significant progress at reducing allocations and it'll be interesting to see if we can get further on this 🚀

davidhewitt avatar Mar 26 '24 22:03 davidhewitt

That is great news, awesome work!

AntonisAgap avatar Mar 27 '24 08:03 AntonisAgap

@davidhewitt I installed 2.7.0b1, to run some memray test with just importing a ~5000 line generated complex model file.

Compared to 2.6 it's better, but comparing to v1 it's really slow to import and It's using much more memory during runtime. We actually still get the same memory usage as it was with 2.6 during runtime.

Import times:

  • v1: 0.34s
  • 2.6.3: 1.13s
  • 2.7.0b1: 1.06s

Memory RES usage reported by OS (after substracting python base before import):

  • v1: 15.1MB
  • 2.6.3: 98.6MB
  • 2.7.0b1: 98.6MB

I think the problem is related to the extensive use of submodels, because just commenting out the top model (6 lines) reduces the memory use by 10MB.

Does this mean each of the models are allocated separately, even if they are part of another model? For example if we have a specific model that is used in 10 other models, and those are also used in 10 other, it would be allocated 100+1 times?

== 2.6.3 v2

📏 Total allocations:
	684570

📦 Total memory allocated:
	987.497MB

📊 Histogram of allocation size:
	min: 1.000B
	---------------------------------------------
	< 4.000B   :  19581 ▇▇▇
	< 15.000B  :  87275 ▇▇▇▇▇▇▇▇▇▇▇▇
	< 63.000B  : 189091 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 255.000B : 174132 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 1.000KB  : 133436 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 3.999KB  :  61575 ▇▇▇▇▇▇▇▇▇
	< 15.999KB :  15152 ▇▇▇
	< 63.999KB :   2722 ▇
	< 255.999KB:    949 ▇
	<=1.000MB  :    657 ▇
	---------------------------------------------
	max: 1.000MB

📂 Allocator type distribution:
	 MALLOC: 567877
	 REALLOC: 83122
	 CALLOC: 25569
	 MMAP: 7322
	 POSIX_MEMALIGN: 680

🥇 Top 5 largest allocating locations (by size):
	- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:575 -> 270.933MB
	- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 230.421MB
	- complete_model_class:/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:544 -> 176.144MB
	- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 79.741MB
	- _walk:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:207 -> 45.688MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 417872
	- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:575 -> 137497
	- complete_model_class:/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:544 -> 61899
	- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 19239
	- _get_code_from_file:<frozen runpy>:259 -> 7938
== 2.7.0b1  v2

📏 Total allocations:
	682251

📦 Total memory allocated:
	464.631MB

📊 Histogram of allocation size:
	min: 1.000B
	---------------------------------------------
	< 4.000B   :  19957 ▇▇▇
	< 15.000B  :  87660 ▇▇▇▇▇▇▇▇▇▇▇▇
	< 63.000B  : 188837 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 255.000B : 173695 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 1.000KB  : 134558 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 3.999KB  :  61348 ▇▇▇▇▇▇▇▇▇
	< 15.999KB :  13335 ▇▇
	< 63.999KB :   2733 ▇
	< 255.999KB:     61 ▇
	<=1.000MB  :     67 ▇
	---------------------------------------------
	max: 1.000MB

📂 Allocator type distribution:
	 MALLOC: 568150
	 REALLOC: 82313
	 CALLOC: 25625
	 MMAP: 5483
	 POSIX_MEMALIGN: 680

🥇 Top 5 largest allocating locations (by size):
	- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:568 -> 110.987MB
	- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 78.195MB
	- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 68.596MB
	- _get_code_from_file:<frozen runpy>:259 -> 29.143MB
	- _handle_other_schemas:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:209 -> 26.000MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- create_schema_validator:/lib/python3.12/site-packages/pydantic/plugin/_schema_validator.py:49 -> 418292
	- validate_core_schema:/lib/python3.12/site-packages/pydantic/_internal/_core_utils.py:568 -> 133966
	- complete_model_class:/lib/python3.12/site-packages/pydantic/_internal/_model_construction.py:566 -> 61177
	- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 19349
	- _get_code_from_file:<frozen runpy>:259 -> 7938
== v1 ==
📏 Total allocations:
	32328

📦 Total memory allocated:
	100.179MB

📊 Histogram of allocation size:
	min: 1.000B
	--------------------------------------------
	< 3.000B   :  4616 ▇▇▇▇▇▇▇▇▇▇
	< 14.000B  :     0 
	< 55.000B  :   100 ▇
	< 209.000B :     9 ▇
	< 795.000B : 12645 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 2.954KB  :  4671 ▇▇▇▇▇▇▇▇▇▇
	< 11.234KB :  8207 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 42.726KB :  2027 ▇▇▇▇▇
	< 162.487KB:    32 ▇
	<=617.938KB:    21 ▇
	--------------------------------------------
	max: 617.938KB

📂 Allocator type distribution:
	 MALLOC: 20840
	 CALLOC: 9776
	 REALLOC: 1712

🥇 Top 5 largest allocating locations (by size):
	- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 57.338MB
	- _get_code_from_file:<frozen runpy>:259 -> 29.017MB
	- update_model_forward_refs:/lib/python3.12/site-packages/pydantic/v1/typing.py:546 -> 3.508MB
	- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:301 -> 1.215MB
	- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:324 -> 1.215MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- __init__:/.pyenv/versions/3.12.1/lib/python3.12/typing.py:880 -> 14195
	- _get_code_from_file:<frozen runpy>:259 -> 7925
	- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:301 -> 1659
	- Field:/lib/python3.12/site-packages/pydantic/v1/fields.py:324 -> 1659
	- __new__:<frozen abc>:106 -> 991

atti92 avatar Apr 05 '24 08:04 atti92

Maybe a good example for optimization:

# generate V1 models:
datamodel-codegen --url https://github.com/kubernetes/kubernetes/raw/master/api/openapi-spec/swagger.json  > k8s_v1.py

# generate V2 models:
datamodel-codegen --url https://github.com/kubernetes/kubernetes/raw/master/api/openapi-spec/swagger.json  --output-model-type=pydantic_v2.BaseModel > k8s_v2.py

I did the following in 2 separate processes: > import k8s_v1 > import k8s_v2

image

(also note the CPU time diff)

atti92 avatar Apr 05 '24 11:04 atti92

Are there any more big wins here? The memory usage and CPU usage is still preventing us from upgrading our app from Pydantic v1 -> v2.

michaelgmiller1 avatar May 14 '24 00:05 michaelgmiller1

I just want to chip inn and confirm memory usage increased when i upgraded from Pydantic 1.x to 2.7.x. I forgot to note down my exact test numbers. But my google app engine instance went from something like 350mb -> 420mb memory used on first page load. I still decided to go for the upgrade, but it is a bit worrying.

Especially worrying since reloading the page several times jumps memory usage up as well. Did for both 1.x to 2.7.x so i cant confirm it is a memory leech in pydantic 2.7, or that its not gc eventually. Or I might have a memory leech in other parts of my code. Whatever the source of this, it gets worse when the baseline memory use increase.

Andrioden avatar Jun 03 '24 11:06 Andrioden

I did some basic profiling over huge RAM allocation by pydantic 2 here: https://github.com/pylakey/aiotdlib/issues/135

truenicoco avatar Jul 13 '24 10:07 truenicoco

@truenicoco

I did some basic profiling over huge RAM allocation by pydantic 2 here: pylakey/aiotdlib#135

I don't think they are looking at this issue anymore (no new replies to any of our feedback since March) , maybe it worth creating a new ticket, because it is still very well a big issue for us too.

atti92 avatar Jul 13 '24 10:07 atti92

datamodel-codegen --url https://github.com/kubernetes/kubernetes/raw/master/api/openapi-spec/swagger.json  --output-model-type=pydantic_v2.BaseModel > k8s_v2.py

This is a significant issue I've noticed. I have seen services with fairly large API schema documents (20-30k line OpenAPI files) running on AWS Lambda that use generated pydantic models, generated from datamodel-codegen. The upgrade from Pydantic v1 to v2 increased the lambda cold-start time from 2-3 seconds to 20-30 seconds, just like your CPU time shows when benchmarking the kubernetes API models. The memory leak improvements have been great, but this is still a major problem that really affects ephemeral compute models like Lambda, where new execution environments are spun up on the fly and things aren't cached, so imports have to be reloaded.

I profiled the kubernetes import k8s_v2 statement below

image

These functions seem to be the major bottlenecks in terms of time

https://github.com/pydantic/pydantic/blob/d654a0766c2f3c6fe0a12718f32aa3bf4d3ecc86/pydantic/_internal/_model_construction.py#L661

https://github.com/pydantic/pydantic/blob/d654a0766c2f3c6fe0a12718f32aa3bf4d3ecc86/pydantic/_internal/_model_construction.py#L681

myke2424 avatar Jul 14 '24 19:07 myke2424

Working on this now, hoping to roll out a fix in v2.9 at the end of the month!

sydney-runkle avatar Aug 02 '24 01:08 sydney-runkle

Hey folks - I've addressed this via https://github.com/pydantic/pydantic/pull/10113 as well. Let me know what sort of improvements you get with local testing!

sydney-runkle avatar Aug 13 '24 17:08 sydney-runkle

Amazing work @sydney-runkle, thank you so much.

samuelcolvin avatar Aug 13 '24 18:08 samuelcolvin