openai-python
openai-python copied to clipboard
Memory leak
Confirm this is an issue with the Python library and not an underlying OpenAI API
- [X] This is an issue with the Python library
Describe the bug
I am using the AsyncAzureOpenAI class to instantiate a client and using a stream call to client.chat.completions.create. Even after performing close() on both client and response within a try-finally block, I am still encountering a memory leak that eventually leads to server crash. I tried the solution outlined in https://github.com/openai/openai-python/issues/1181, where the pydantic package was upgraded to 2.6.3, but this hasn't resolved my issue. I noticed using the gc library that memory usage increases after each call to this service. Our service is used for centralized management of AzureOpenAI accounts, hence a client is instantiated for every incoming request. Given the concurrent nature of this service, I'm wondering if client.with_options can support concurrent usage. Do you have any good solutions to address this memory leak issue?
To Reproduce
Several calls in a row, for example, to embeddings that are wrapped with asynс.
Code snippets
class LlmStreamApiHandler(tornado.web.RequestHandler):
executor = ThreadPoolExecutor(200)
def __init__(self, *args, **kwargs):
super(LlmStreamApiHandler, self).__init__(*args, **kwargs)
self.set_header('Content-Type', 'text/event-stream')
self.set_header('Access-Control-Allow-Origin', "*")
self.set_header("Access-Control-Allow-Headers", "*")
self.set_header("Access-Control-Allow-Methods", "*")
def on_finish(self):
return super().on_finish()
async def post(self):
try:
result = await self.process(...)
except Exception as e:
...
self.write(json.dumps(result) + "\n")
await self.flush()
async def process(self, ...)
client = openai.AsyncAzureOpenAI(
api_version=api_version,
api_key=api_key,
azure_endpoint=azure_endpoint,
http_client=httpx.AsyncClient(
proxies=config.api_proxy,
),
max_retries=0
)
response_text = False
try:
response_text = await client.chat.completions.create(**prompt)
async for chunk in response_text:
chunk = chunk.model_dump()
if chunk['choices'] == [] and chunk['id'] == "" and chunk['model'] == "" and chunk['object'] == "":
continue
chunk_message = chunk['choices'][0]['delta']
current_text = chunk_message.get('content', '')
if bool(chunk_message) and current_text:
...
elif chunk['choices'][0]["finish_reason"] == "stop":
break
elif current_text == '' and chunk_message.get('role', '') == "assistant":
...
elif chunk['choices'][0]["finish_reason"] == "content_filter":
...
else:
continue
self.write(json.dumps(json_data) + "\n")
await self.flush()
except Exception as e:
...
raise ...
finally:
if response_text:
await response_text.close()
await client.close()
return ...
OS
CentOS
Python version
Python 3.8
Library version
openai v1.12.0