openai-python httpx client has very poor performance for concurrent requests compared to aiohttp

Confirm this is an issue with the Python library and not an underlying OpenAI API

[X] This is an issue with the Python library

Describe the bug

The API client uses httpx, which has very poor performance when making concurrent requests compared to aiohttp. Open issue for httpx here

This is forcing us to swap out the OpenAI SDK for our own implementation, which is a pain.

I suspect it is the root cause of the difference between node.js and Python demonstrated here

I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this. However for our high concurrency async use cases this renders the SDK useless.

To Reproduce

To reproduce, run chat completion requests in parallel with 20+ concurrent requests, benchmarking the openai API client against an implementation using aiohttp. Example code can be found in the linked issue in httpx.

Code snippets

No response

OS

Linux/MacOs

Python version

v3.12

Library version

1.12.0

Aug 05 '24 10:08 willthayes

Interesting, I was not aware there was such a performance discrepancy between aiohttp and httpx.

From skimming the linked issue it thankfully seems like there's a lot of inflight work that would bring httpx up to par performance-wise.

I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this.

Yes, consistency here is very important, using different clients would make everything much more complicated/confusing for little gain, especially as this performance discrepancy can be fixed.

However for our high concurrency async use cases this renders the SDK useless.

Sorry about this, hopefully the httpx PRs can be merged soon.

In the meantime it might be less work for you to use a patched version of httpx with the performance fixes included in the linked issue.

I'm going to close this as we have no plans to move away from httpx. In the future we may offer a more extensible custom http client API which would allow you to use any http library as long as you implement the interface. However this isn't likely to happen anytime soon unfortunately.

I'll see if we can help land the httpx / httpcore PRs faster.

Aug 05 '24 14:08 RobertCraigie

Understood, thanks for the quick reply!

Aug 05 '24 14:08 willthayes

Hi everyone,

I'm working on a chatbot and ran some load testing and profiling under high concurrent users. It appears that the performance of httpx in asynchronous mode, which is used by the OpenAI Python SDK, isn't meeting expectations.

Are there any updates or planned improvements on this? Or would it be worth benchmarking my application with aiohttp to see if it performs better under high concurrency?

Thanks in advance for your insights!

Oct 29 '24 08:10 ShubhamMaddhashiya-bidgely

^ Likewise, to +1 this, our aiohttp impl is substantially faster, sometimes even twice as much with aiohttp - I go from about 100 requests per second to 200+ on my local machine, so not sure what the discrepancy is caused by.

Oct 31 '24 00:10 opalrose-510

We've moved from using httpx to aiohttp and it solved a lot of our concurrency issues - it means we don't use the openai client which I was initially concerned about due to the theoretical connection sharing benefit of using the client but this pales in comparison. I wish i'd swapped sooner.

Nov 02 '24 15:11 Tom-Standen

this is the aiohttp class we are using to maximize throughput while this issue is open:

class OpenAISession:
    def __init__(
        self,
        api_key: str,
    ):
        self.api_key = api_key
        self.client = OpenAISession._make_client(api_key)

    @staticmethod
    def _make_client(api_key: str) -> ClientSession:
        connector = TCPConnector(limit=500)
        timeout = ClientTimeout(total=600, connect=5)
        client = ClientSession(connector=connector, timeout=timeout)
        client.headers["Authorization"] = f"Bearer {api_key}"
        client.headers["Content-Type"] = "application/json"
        return client

    async def chat_completion(
        self, model: str, messages: list[dict], temperature: float
    ) -> Completion:
        if self.client.closed:
            self.client = OpenAISession._make_client(self.api_key)
        try:
            payload = {
                "model": model,
                "messages": messages,
                "temperature": temperature,
            }
            logging.info(f"OpenAI API Request Payload: {json.dumps(payload, indent=2)}")
            
            async with self.client.post(
                "https://api.openai.com/v1/chat/completions",
                json=payload,
                timeout=ClientTimeout(total=60),
            ) as resp:
                if resp.status == 429:
                    raise RateLimitError()
                elif resp.status in (500, 503):
                    raise TemporaryFailure(resp.reason)
                elif resp.status == 400:
                    error_body = await resp.text()
                    logging.error(f"OpenAI API 400 Error: {error_body}")
                    logging.error(f"Request payload that caused error: {json.dumps(payload, indent=2)}")
                resp.raise_for_status()
                return cast(Completion, await resp.json())
        except (asyncio.TimeoutError, ServerTimeoutError, ServerDisconnectedError) as e:
            # https://github.com/aio-libs/aiohttp/issues/8133
            raise TemporaryFailure(str(e)) from e

    async def __aenter__(self) -> Self:
        await self.client.__aenter__()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
        return await self.client.__aexit__(exc_type, exc_val, exc_tb)

Nov 02 '24 16:11 iliazintchenko

In my case, after profiling, I identified a performance overhead in the AsyncConnectionPool.handle_async_request method. It appears this is mainly due to inefficient handling of idle or expired connections, as revealed in the profiling. We could reduce time complexity by using a more optimized data structure, rather than repeatedly iterating over the connections.

I came across these pull requests addressing similar issues, but they haven't been merged yet: Issue #3215 comment.

I tested the optimizations by installing the patches from the all-optimizations branch by @MarkusSintonen:

pip install --upgrade git+https://github.com/MarkusSintonen/httpcore.git@all-optimizations

With these changes, the performance overhead in AsyncConnectionPool.handle_async_request was significantly reduced.

It would be very helpful if the OpenAI SDK supports a drop-in replacement for http_client, allowing us to swap httpx with aiohttp easily.

Nov 04 '24 07:11 ShubhamMaddhashiya-bidgely

If performance is still a concern here, please try aiohttp 3.11.x+ as we have smoothed out some more of the concurrency delays and cancellation races in this version.

https://docs.aiohttp.org/en/stable/changes.html

Nov 14 '24 23:11 bdraco

@RobertCraigie : I see you have closed this marking it as not planned here.

Shouldn't this be a priority item for OpenAI SDK team to make this SDK usable for use cases to scale ? Doesn't this discourage users to use this SDK at scale ?

Jan 03 '25 18:01 rachitchauhan43

Same here... :(

I am a developer of an AI character service that provides interactive conversations, and it uses large language models (LLMs) to generate the characters' responses. The service has used openai package v0.27.4 to request to LLMs.

After upgrading openai package to v1.59.4, I noticed a decrease in the server's throughput and an increase in the latency of LLM hosting servers. I suspect that httpx is the cause, as discussed in related conversations, and I would like to explore potential solutions.

I look forward to responses from contributors.

Jan 08 '25 06:01 kamillle

Sorry about this – we're tracking primarily at https://github.com/encode/httpx/issues/3215 but I agree it makes sense to leave this open, as one way or another I do agree this needs to get solved for users of this SDK.

Jan 13 '25 18:01 rattrayalex

Stopgaps you can try while we work with Tom to improve httpcore itself:

Use https://github.com/lizeyan/httpx-AIOHttpTransport/ as a custom transport with httpx
Use https://github.com/MtkN1/httpcore-speedups instead of httpcore, with eg uv add httpcore --index httpcore-speedups=https://mtkn1.github.io/httpcore-speedups/simple/

Mar 07 '25 15:03 rattrayalex

Update: For anyone running into async performance issues with this library, I recommend using https://github.com/karpetrosyan/httpx-aiohttp, which is published by an httpx maintainer, like so:

import asyncio
import openai
from aiohttp import ClientSession
from httpx_aiohttp import AiohttpTransport

async def main() -> None:
    async with AiohttpTransport(client=ClientSession()) as aiohttp_transport:
        httpx_client = openai.DefaultAsyncHttpxClient(transport=aiohttp_transport)
        client = openai.AsyncOpenAI(http_client=httpx_client)

        rsp = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "say hello!"}],
        )
        print(rsp)

asyncio.run(main())

Mar 09 '25 19:03 rattrayalex

Deployed a solution with httpx_aiohttp, but a lot of warnings are raised such as:

Unclosed connection client_connection: Connection<ConnectionKey(...)>

Not sure what exactly is happing, but probably because of issues in httpx_aiohttp.

(update) Issued: https://github.com/karpetrosyan/httpx-aiohttp/issues/4

Mar 25 '25 04:03 gfx

We also encounter problems with the httpx client, implemented by v1, when we share the client across multiple threads (sync). The pod freezes because the connection is not released or something similar. We got this issue for months, so curious if anyone also have this issue with sync client.

Mar 25 '25 22:03 thsunkid

On top of this, the relative sluggish releases of httpx are currently blocking a critical security issue with h11 via httpcore.

Apr 29 '25 13:04 loleg

@loleg I don't want this thread to get off-topic; would you mind messaging me and cc'ing [email protected] with more details?

EDIT: it looks like you're talking about https://github.com/encode/httpcore/releases/tag/1.0.9 which was released a week ago and can be used with httpx already (httpx specifies httpcore==0.*).

Any further discussion of this should take place in a separate GitHub issue or over email.

Apr 30 '25 11:04 rattrayalex

Update: with many thanks for support from @rattrayalex and team, and apologies for diverting from the performance discussion, I've mostly ruled out that the issue is with this package or with httpx. In my case an upgrade to poetry 2.1.2 seems to have resolved a dependency conflict. Up and running with:

openai==1.76.2 h11==0.16.0 httpcore==1.0.9 httpx==0.28.1

May 03 '25 07:05 loleg

Is this issue solved now in the latest version?

May 19 '25 07:05 markwitt1

This suggestion was great, we see a 97% lower median latency on load tests

May 23 '25 17:05 ishaan-jaff

We now have builtin support for aiohttp usage, please try it out and let us know if you run into any issues!

https://github.com/openai/openai-python#with-aiohttp

(There are still some more things to come such as more detailed docs + making it easier to instantiate your own ClientSession())

Jun 20 '25 19:06 RobertCraigie

I have just released a new high-perf http library which is fully Rust-reqwest based. Batteries are included, including unit testing support. Go check it out: https://github.com/MarkusSintonen/pyreqwest :)

Oct 12 '25 19:10 MarkusSintonen