httpx client has very poor performance for concurrent requests compared to aiohttp
Confirm this is an issue with the Python library and not an underlying OpenAI API
- [X] This is an issue with the Python library
Describe the bug
The API client uses httpx, which has very poor performance when making concurrent requests compared to aiohttp. Open issue for httpx here
This is forcing us to swap out the OpenAI SDK for our own implementation, which is a pain.
I suspect it is the root cause of the difference between node.js and Python demonstrated here
I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this. However for our high concurrency async use cases this renders the SDK useless.
To Reproduce
To reproduce, run chat completion requests in parallel with 20+ concurrent requests, benchmarking the openai API client against an implementation using aiohttp. Example code can be found in the linked issue in httpx.
Code snippets
No response
OS
Linux/MacOs
Python version
v3.12
Library version
1.12.0
Interesting, I was not aware there was such a performance discrepancy between aiohttp and httpx.
From skimming the linked issue it thankfully seems like there's a lot of inflight work that would bring httpx up to par performance-wise.
I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this.
Yes, consistency here is very important, using different clients would make everything much more complicated/confusing for little gain, especially as this performance discrepancy can be fixed.
However for our high concurrency async use cases this renders the SDK useless.
Sorry about this, hopefully the httpx PRs can be merged soon.
In the meantime it might be less work for you to use a patched version of httpx with the performance fixes included in the linked issue.
I'm going to close this as we have no plans to move away from httpx. In the future we may offer a more extensible custom http client API which would allow you to use any http library as long as you implement the interface. However this isn't likely to happen anytime soon unfortunately.
I'll see if we can help land the httpx / httpcore PRs faster.
Understood, thanks for the quick reply!
Hi everyone,
I'm working on a chatbot and ran some load testing and profiling under high concurrent users. It appears that the performance of httpx in asynchronous mode, which is used by the OpenAI Python SDK, isn't meeting expectations.
Are there any updates or planned improvements on this? Or would it be worth benchmarking my application with aiohttp to see if it performs better under high concurrency?
Thanks in advance for your insights!
^ Likewise, to +1 this, our aiohttp impl is substantially faster, sometimes even twice as much with aiohttp - I go from about 100 requests per second to 200+ on my local machine, so not sure what the discrepancy is caused by.
We've moved from using httpx to aiohttp and it solved a lot of our concurrency issues - it means we don't use the openai client which I was initially concerned about due to the theoretical connection sharing benefit of using the client but this pales in comparison. I wish i'd swapped sooner.
this is the aiohttp class we are using to maximize throughput while this issue is open:
class OpenAISession:
def __init__(
self,
api_key: str,
):
self.api_key = api_key
self.client = OpenAISession._make_client(api_key)
@staticmethod
def _make_client(api_key: str) -> ClientSession:
connector = TCPConnector(limit=500)
timeout = ClientTimeout(total=600, connect=5)
client = ClientSession(connector=connector, timeout=timeout)
client.headers["Authorization"] = f"Bearer {api_key}"
client.headers["Content-Type"] = "application/json"
return client
async def chat_completion(
self, model: str, messages: list[dict], temperature: float
) -> Completion:
if self.client.closed:
self.client = OpenAISession._make_client(self.api_key)
try:
payload = {
"model": model,
"messages": messages,
"temperature": temperature,
}
logging.info(f"OpenAI API Request Payload: {json.dumps(payload, indent=2)}")
async with self.client.post(
"https://api.openai.com/v1/chat/completions",
json=payload,
timeout=ClientTimeout(total=60),
) as resp:
if resp.status == 429:
raise RateLimitError()
elif resp.status in (500, 503):
raise TemporaryFailure(resp.reason)
elif resp.status == 400:
error_body = await resp.text()
logging.error(f"OpenAI API 400 Error: {error_body}")
logging.error(f"Request payload that caused error: {json.dumps(payload, indent=2)}")
resp.raise_for_status()
return cast(Completion, await resp.json())
except (asyncio.TimeoutError, ServerTimeoutError, ServerDisconnectedError) as e:
# https://github.com/aio-libs/aiohttp/issues/8133
raise TemporaryFailure(str(e)) from e
async def __aenter__(self) -> Self:
await self.client.__aenter__()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
return await self.client.__aexit__(exc_type, exc_val, exc_tb)
In my case, after profiling, I identified a performance overhead in the AsyncConnectionPool.handle_async_request method. It appears this is mainly due to inefficient handling of idle or expired connections, as revealed in the profiling. We could reduce time complexity by using a more optimized data structure, rather than repeatedly iterating over the connections.
I came across these pull requests addressing similar issues, but they haven't been merged yet: Issue #3215 comment.
I tested the optimizations by installing the patches from the all-optimizations branch by @MarkusSintonen:
pip install --upgrade git+https://github.com/MarkusSintonen/httpcore.git@all-optimizations
With these changes, the performance overhead in AsyncConnectionPool.handle_async_request was significantly reduced.
It would be very helpful if the OpenAI SDK supports a drop-in replacement for http_client, allowing us to swap httpx with aiohttp easily.
If performance is still a concern here, please try aiohttp 3.11.x+ as we have smoothed out some more of the concurrency delays and cancellation races in this version.
https://docs.aiohttp.org/en/stable/changes.html
@RobertCraigie : I see you have closed this marking it as not planned here.
Shouldn't this be a priority item for OpenAI SDK team to make this SDK usable for use cases to scale ? Doesn't this discourage users to use this SDK at scale ?
Same here... :(
I am a developer of an AI character service that provides interactive conversations, and it uses large language models (LLMs) to generate the characters' responses. The service has used openai package v0.27.4 to request to LLMs.
After upgrading openai package to v1.59.4, I noticed a decrease in the server's throughput and an increase in the latency of LLM hosting servers. I suspect that httpx is the cause, as discussed in related conversations, and I would like to explore potential solutions.
I look forward to responses from contributors.
Sorry about this – we're tracking primarily at https://github.com/encode/httpx/issues/3215 but I agree it makes sense to leave this open, as one way or another I do agree this needs to get solved for users of this SDK.
Stopgaps you can try while we work with Tom to improve httpcore itself:
- Use https://github.com/lizeyan/httpx-AIOHttpTransport/ as a custom transport with httpx
- Use https://github.com/MtkN1/httpcore-speedups instead of
httpcore, with eguv add httpcore --index httpcore-speedups=https://mtkn1.github.io/httpcore-speedups/simple/
Update: For anyone running into async performance issues with this library, I recommend using https://github.com/karpetrosyan/httpx-aiohttp, which is published by an httpx maintainer, like so:
import asyncio
import openai
from aiohttp import ClientSession
from httpx_aiohttp import AiohttpTransport
async def main() -> None:
async with AiohttpTransport(client=ClientSession()) as aiohttp_transport:
httpx_client = openai.DefaultAsyncHttpxClient(transport=aiohttp_transport)
client = openai.AsyncOpenAI(http_client=httpx_client)
rsp = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "say hello!"}],
)
print(rsp)
asyncio.run(main())
Deployed a solution with httpx_aiohttp, but a lot of warnings are raised such as:
Unclosed connection client_connection: Connection<ConnectionKey(...)>
Not sure what exactly is happing, but probably because of issues in httpx_aiohttp.
(update) Issued: https://github.com/karpetrosyan/httpx-aiohttp/issues/4
We also encounter problems with the httpx client, implemented by v1, when we share the client across multiple threads (sync). The pod freezes because the connection is not released or something similar. We got this issue for months, so curious if anyone also have this issue with sync client.
On top of this, the relative sluggish releases of httpx are currently blocking a critical security issue with h11 via httpcore.
@loleg I don't want this thread to get off-topic; would you mind messaging me and cc'ing [email protected] with more details?
EDIT: it looks like you're talking about https://github.com/encode/httpcore/releases/tag/1.0.9 which was released a week ago and can be used with httpx already (httpx specifies httpcore==0.*).
Any further discussion of this should take place in a separate GitHub issue or over email.
Update: with many thanks for support from @rattrayalex and team, and apologies for diverting from the performance discussion, I've mostly ruled out that the issue is with this package or with httpx. In my case an upgrade to poetry 2.1.2 seems to have resolved a dependency conflict. Up and running with:
openai==1.76.2 h11==0.16.0 httpcore==1.0.9 httpx==0.28.1
Is this issue solved now in the latest version?
This suggestion was great, we see a 97% lower median latency on load tests
We now have builtin support for aiohttp usage, please try it out and let us know if you run into any issues!
https://github.com/openai/openai-python#with-aiohttp
(There are still some more things to come such as more detailed docs + making it easier to instantiate your own ClientSession())
I have just released a new high-perf http library which is fully Rust-reqwest based. Batteries are included, including unit testing support. Go check it out: https://github.com/MarkusSintonen/pyreqwest :)