azure-sdk-for-python Does using a connection pool with Python's Async CosmosClient improve performance?

I'm writing a FastAPI app that uses the async CosmosClient. I'm expecting a lot of requests. Would using some sort of connection pool improve performance? Or does the nature of async meant that there's no advantage to using a pool, since only one task would be running and using the CosmosClient at any one time anyway?

Every article about connection pools always compares them to setting up and tearing down a new connection on every request. I'm interested in how the performance compares of a single, global CosmosClient that gets reused for all requests, or a pool of Cosmos Clients, in an async environment.

If using a connection pool would be beneficial, is there any documentation for what that might look like in Python?

Related question - is using the async CosmosClient from multiple concurrent tasks safe? Or is there a chance that one task might get a response meant for another task, or make a request at the same time as another task?

Jan 16 '24 02:01 tal-zvon

Thanks for your question @tal-zvon! Generally speaking, if you're using the CosmosClient within a context manager, the connection pool should be managed internally in the default transport:

async with CosmosClient(...) as client:
    await client.do_something(...)

If you're needing multiple clients, then the connection pool can be maintained between clients, then this can be achieved by creating your own transport and sharing it between clients, as in this sample here: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/core/azure-core/samples/example_shared_transport_async.py#L51-L71

Regarding your question of concurrency - depending on your usage, yes, the clients are intended to be thread (sync) and coroutine (async) safe. I say "depending on", because we currently have an open bug for concurrency-safety with usage of the response_hook parameter for accessing response headers. Though there's already a PR open for this and we expect to have it resolved soon. Hopefully this helps - let me know if anything needs further details or clarification :)

Jan 16 '24 02:01 annatisch

Thank you for the quick response @annatisch.

To clarify that I understand you correctly, when you say:

the connection pool should be managed internally in the default transport

This is what I'm understanding: the CosmosClient uses the HTTPS protocol to communicate with the CosmosDB server, and the connection pool is a pool of TCP connections that can be reused. When you create an instance of CosmosClient and send the initial request for a resource (like a container or document), it establishes multiple TCP connections to the CosmosDB server, and on every request, uses one of those TCP connections. If one TCP connection fails for any reason, the CosmosClient will automatically send requests through a different TCP connection (so requests don't need to wait) while concurrently re-establishing the failed connection. All this happens internally, and transparently to anyone using the CosmosClient. Does that sound right? Or am I way off?

If that sounds right, then when you say:

If you're needing multiple clients

This is what I'm trying to figure out - in what situation might I need multiple CosmosClient instances, as opposed to reusing the same CosmosClient instance for all requests? Is there any advantages in me creating a pool of ComosClient instances? Or, since it's already doing connection pooling internally, I should just use a single CosmosClient instance for all my requests?

Jan 16 '24 14:01 tal-zvon

Thanks @tal-zvon, To clarify, the connection pool itself is not implemented in the Azure SDK, we leverage 3rd party HTTP libraries as the transport underpinning the SDK - so the exact behaviour of the connection pool may vary a bit depending on the transport used. The default transport for async operations is aiohttp, and the CosmosClient maintains an aiohttp ClientSession. This session is instantiated with minimal configuration. If you wish to further refine the behaviour of the session, or customize it in any way - you can create your own aiohttp.ClientSession and pass it into the client transport as can be seen in the sample I linked to above.

Reusing the same CosmosClient should offer no performance disadvantage/advantage over sharing the same transport between multiple clients - so the decision would largely come down to the design of the application/stylistic choice. Another consideration here will be your choice of authentication mechanism, for example if using AAD (via DefaultAzureCredential), then this will also maintain a connection. You can find more discussion on the topic in this issue: https://github.com/Azure/azure-sdk-for-python/issues/28665

In terms of creating a pool of CosmosClient instance - assuming they're reusing the connection pool - the only advantage that would come to mind right now would be as a workaround for the concurrency bug in response_hook I mentioned earlier if you would be impacted by this before we can get the fix out.

I hope this helps!

Jan 16 '24 19:01 annatisch

@annatisch Thank you! That was very helpful!

Jan 19 '24 16:01 tal-zvon

@annatisch I'm noticing the keepalive_timeout in TCPConnector is 15 seconds. So when a request is sent within 15 seconds, performance is greater (a few ms). However, if a new request is sent after the timeout, the first request can take 2 seconds. How can I increase this timeout? Thanks! (In connector.py the timeout is set to 15 seconds)

Mar 05 '24 01:03 heathbm