openai-python Possible memory leak in `AsyncCompletions.parse()`

Confirm this is an issue with the Python library and not an underlying OpenAI API

[x] This is an issue with the Python library

Describe the bug

There might be a memory leak when using the method .parse() on AsyncCompletions with Pydantic models created with create_model. When submitting several calls, the memory usage keeps on rising. I haven't found any plateau yet, which could mean the parsers built upon these models might not be garbage collected.

To Reproduce

Have a function that creates a Pydantic model with create_model
Have several calls where the response_format param always gets a new model from the function above
Monitor the memory

We do have a work-around though. The leaking scenario will be called leaking and the safe one non_leaking in the snippets.

Please let me know if you need more info. Thanks a lot.

Code snippets

import asyncio
import gc
import os
from typing import List

from memory_profiler import profile
from openai import AsyncOpenAI
from openai.lib._parsing import type_to_response_format_param
from pydantic import Field, create_model


StepModel = create_model(
    "Step",
    explanation=(str, Field()),
    output=(str, Field()),
)


def create_new_model():
    """This sounds useless as it is. In our business case, I'm generating a model that slightly different at each call, hence the use of create_model. This illustrates of a model that seems to always be the same keeps on adding up in the memory."""
    return create_model(
        "MathResponse",
        steps=(List[StepModel], Field()),
        final_answer=(str, Field()),
    )


@profile()
async def leaking_call(client, new_model):
    await client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "You are a helpful math tutor."},
            {"role": "user", "content": "solve 8x + 31 = 2"},
        ],
        response_format=new_model,
    )


async def non_leaking_call(client, new_model):
    await client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "You are a helpful math tutor."},
            {"role": "user", "content": "solve 8x + 31 = 2"},
        ],
        response_format=type_to_response_format_param(new_model),
    )


async def main():
    client = AsyncOpenAI()

    for _ in range(200):
        # You can switch to `non_leaking_call` and see that the memory is correctly emptied
        await leaking_call(client, create_new_model())

        # We wanted to thoroughly check the memory usage, hence memory profiler + gc
        gc.collect()
        print(len(gc.get_objects()))


if __name__ == "__main__":
    asyncio.run(main())

OS

macOS

Python version

Python 3.11.9

Library version

openai v1.64.0

Feb 26 '25 11:02 anteverse

Thanks for the report, what version of Pydantic are you using?

Feb 26 '25 12:02 RobertCraigie

Pydantic 2.10.6. Also tested with 2.9.2 earlier

Feb 26 '25 13:02 anteverse