openai-python
openai-python copied to clipboard
Memory leak
Confirm this is an issue with the Python library and not an underlying OpenAI API
- [X] This is an issue with the Python library
Describe the bug
I am using the AsyncAzureOpenAI class to instantiate a client and using a stream call to client.chat.completions.create. Even after performing close() on both client and response within a try-finally block, I am still encountering a memory leak that eventually leads to server crash. I tried the solution outlined in https://github.com/openai/openai-python/issues/1181, where the pydantic package was upgraded to 2.6.3, but this hasn't resolved my issue. I noticed using the gc library that memory usage increases after each call to this service. Our service is used for centralized management of AzureOpenAI accounts, hence a client is instantiated for every incoming request. Given the concurrent nature of this service, I'm wondering if client.with_options can support concurrent usage. Do you have any good solutions to address this memory leak issue?
To Reproduce
Several calls in a row, for example, to embeddings that are wrapped with asynс.
Code snippets
class LlmStreamApiHandler(tornado.web.RequestHandler):
executor = ThreadPoolExecutor(200)
def __init__(self, *args, **kwargs):
super(LlmStreamApiHandler, self).__init__(*args, **kwargs)
self.set_header('Content-Type', 'text/event-stream')
self.set_header('Access-Control-Allow-Origin', "*")
self.set_header("Access-Control-Allow-Headers", "*")
self.set_header("Access-Control-Allow-Methods", "*")
def on_finish(self):
return super().on_finish()
async def post(self):
try:
result = await self.process(...)
except Exception as e:
...
self.write(json.dumps(result) + "\n")
await self.flush()
async def process(self, ...)
client = openai.AsyncAzureOpenAI(
api_version=api_version,
api_key=api_key,
azure_endpoint=azure_endpoint,
http_client=httpx.AsyncClient(
proxies=config.api_proxy,
),
max_retries=0
)
response_text = False
try:
response_text = await client.chat.completions.create(**prompt)
async for chunk in response_text:
chunk = chunk.model_dump()
if chunk['choices'] == [] and chunk['id'] == "" and chunk['model'] == "" and chunk['object'] == "":
continue
chunk_message = chunk['choices'][0]['delta']
current_text = chunk_message.get('content', '')
if bool(chunk_message) and current_text:
...
elif chunk['choices'][0]["finish_reason"] == "stop":
break
elif current_text == '' and chunk_message.get('role', '') == "assistant":
...
elif chunk['choices'][0]["finish_reason"] == "content_filter":
...
else:
continue
self.write(json.dumps(json_data) + "\n")
await self.flush()
except Exception as e:
...
raise ...
finally:
if response_text:
await response_text.close()
await client.close()
return ...
OS
CentOS
Python version
Python 3.8
Library version
openai v1.12.0
It's possible to reuse the same client for many requests, I think a single one during the lifetime of the process can work fine, at least that's what I've been doing in production so far. (using Python 3.11)
So you could move your client creation to the init maybe? I don't know Tornado though, am using Starlette.
Sure, creating and closing clients should not leak either, and they have been fixing bugs there earlier. But just a note.
@antont May I kindly ask, in your service, is every call made using the same set of initialization parameters for the client? In my service, different request sources have their own specified api_version, api_key, and azure_endpoint parameters. Therefore, I initialize a new, corresponding client object for each request.
I've also noticed the client.with_options() method that can dynamically change these parameters. However, what I'm uncertain about is, if only a single client is used and concurrently called, would client.with_options() lead to an erroneous override of the client parameters?
@a383615194
May I kindly ask, in your service, is every call made using the same set of initialization parameters for the client?
Spot on, that is the case for us.
I've also noticed the client.with_options() method that can dynamically change these parameters. However, what I'm uncertain about is, if only a single client is used and concurrently called, would client.with_options() lead to an erroneous override of the client parameters?
Yep that's why that method exists. AFAIK it works correctly, so that you can reuse the same client but have different e.g. API keys for different requests. I have not used it, however, but only seen it mentioned in previous similar issues here. Some people are vary of it, one person here at least explained how he creates new client objects just to be sure. I would probably try to read it to review if it seems clear and trustworthy and use it then. I guess there are tests for it too, though strange bug cases can be hard to cover if some such would happen with it.
Can you share a repository that demonstrates a minimal reproduction? (The code you shared is a helpful starting point, but something we can download and run and see the error would be very helpful).
I'd also +1 @antont's suggestion to reuse the client.
@a383615194 have you fixed this issue ?
I reuse the client , the issue has gone. the version is 1.23.6 that I used