openai-python Memory leak

Confirm this is an issue with the Python library and not an underlying OpenAI API

[X] This is an issue with the Python library

Describe the bug

I am using the AsyncAzureOpenAI class to instantiate a client and using a stream call to client.chat.completions.create. Even after performing close() on both client and response within a try-finally block, I am still encountering a memory leak that eventually leads to server crash. I tried the solution outlined in https://github.com/openai/openai-python/issues/1181, where the pydantic package was upgraded to 2.6.3, but this hasn't resolved my issue. I noticed using the gc library that memory usage increases after each call to this service. Our service is used for centralized management of AzureOpenAI accounts, hence a client is instantiated for every incoming request. Given the concurrent nature of this service, I'm wondering if client.with_options can support concurrent usage. Do you have any good solutions to address this memory leak issue?

To Reproduce

Several calls in a row, for example, to embeddings that are wrapped with asynс.

Code snippets

class LlmStreamApiHandler(tornado.web.RequestHandler):
    executor = ThreadPoolExecutor(200)

    def __init__(self, *args, **kwargs):
        super(LlmStreamApiHandler, self).__init__(*args, **kwargs)
        self.set_header('Content-Type', 'text/event-stream')
        self.set_header('Access-Control-Allow-Origin', "*")
        self.set_header("Access-Control-Allow-Headers", "*")
        self.set_header("Access-Control-Allow-Methods", "*")

    def on_finish(self):
        return super().on_finish()


    async def post(self):

        try:
            result = await self.process(...)
        except Exception as e:
            ...

        self.write(json.dumps(result) + "\n")
        await self.flush()


    async def process(self, ...)
        client = openai.AsyncAzureOpenAI(
            api_version=api_version,
            api_key=api_key,
            azure_endpoint=azure_endpoint,
            http_client=httpx.AsyncClient(
                proxies=config.api_proxy,
            ),
            max_retries=0
        )
        response_text = False
        try:

            response_text = await client.chat.completions.create(**prompt)
            async for chunk in response_text:
                chunk = chunk.model_dump()
                if chunk['choices'] == [] and chunk['id'] == "" and chunk['model'] == "" and chunk['object'] == "":
                    continue
                chunk_message = chunk['choices'][0]['delta']
                current_text = chunk_message.get('content', '')
                if bool(chunk_message) and current_text:
                    ...
                elif chunk['choices'][0]["finish_reason"] == "stop":
                    break

                elif current_text == '' and chunk_message.get('role', '') == "assistant":
                    ...
                elif chunk['choices'][0]["finish_reason"] == "content_filter":
                    ...
                else:
                    continue
                self.write(json.dumps(json_data) + "\n")
                await self.flush()
        except Exception as e:
            ...
            raise ...
        finally:
            if response_text:
                await response_text.close()
            await client.close()
        return ...

OS

CentOS

Python version

Python 3.8

Library version

openai v1.12.0

Mar 19 '24 09:03 a383615194

It's possible to reuse the same client for many requests, I think a single one during the lifetime of the process can work fine, at least that's what I've been doing in production so far. (using Python 3.11)

So you could move your client creation to the init maybe? I don't know Tornado though, am using Starlette.

Sure, creating and closing clients should not leak either, and they have been fixing bugs there earlier. But just a note.

Mar 19 '24 10:03 antont

@antont May I kindly ask, in your service, is every call made using the same set of initialization parameters for the client? In my service, different request sources have their own specified api_version, api_key, and azure_endpoint parameters. Therefore, I initialize a new, corresponding client object for each request.

I've also noticed the client.with_options() method that can dynamically change these parameters. However, what I'm uncertain about is, if only a single client is used and concurrently called, would client.with_options() lead to an erroneous override of the client parameters?

Mar 19 '24 11:03 a383615194

@a383615194

May I kindly ask, in your service, is every call made using the same set of initialization parameters for the client?

Spot on, that is the case for us.

I've also noticed the client.with_options() method that can dynamically change these parameters. However, what I'm uncertain about is, if only a single client is used and concurrently called, would client.with_options() lead to an erroneous override of the client parameters?

Yep that's why that method exists. AFAIK it works correctly, so that you can reuse the same client but have different e.g. API keys for different requests. I have not used it, however, but only seen it mentioned in previous similar issues here. Some people are vary of it, one person here at least explained how he creates new client objects just to be sure. I would probably try to read it to review if it seems clear and trustworthy and use it then. I guess there are tests for it too, though strange bug cases can be hard to cover if some such would happen with it.

Mar 19 '24 16:03 antont

Can you share a repository that demonstrates a minimal reproduction? (The code you shared is a helpful starting point, but something we can download and run and see the error would be very helpful).

I'd also +1 @antont's suggestion to reuse the client.

Mar 21 '24 01:03 rattrayalex

@a383615194 have you fixed this issue ?

Apr 29 '24 02:04 zhnglicho

I reuse the client , the issue has gone. the version is 1.23.6 that I used

Apr 29 '24 14:04 zhnglicho

openai-python openai-python copied to clipboard

Memory leak

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

openai-python
openai-python copied to clipboard