aiobotocore
aiobotocore copied to clipboard
HeadObject calls are slower than the regular boto when done synchronously
Describe the bug When performing API calls to the s3 through aiobotocore, they get extremely slow compared to the boto itself on the singular mode. In our project, we use s3fs which is a nice wrapper to aiobotocore itself though the problem is that even making a single API call with aiobotocore costs 2-3x more time than making it with boto. I know that there is some sort of overhead for wrapping it and making it async, but I am concerned this being a bug since a single regular operation shouldn't take that much time.
Here is a demo snippet to test it;
import time
import asyncio
import aiobotocore
from boto3.session import Session
def get_kwargs(f_no):
return {
'Bucket': 'some-bucket',
'Key': f'something/mini/file_{f_no}'
}
def sync_boto(times):
session = Session()
s3 = session.client("s3")
start = time.perf_counter()
for f_no in range(times):
head_object = s3.head_object(**get_kwargs(f_no))
end = time.perf_counter()
return end - start
async def async_boto(times, one_by_one=False):
session = aiobotocore.AioSession()
async with session.create_client("s3") as s3:
start = time.perf_counter()
coros = [s3.head_object(**get_kwargs(f_no)) for f_no in range(times)]
if one_by_one:
for coro in coros:
await coro
else:
await asyncio.gather(*coros)
end = time.perf_counter()
return end - start
print('10 sync head_object calls with boto: ', sync_boto(10))
print('10 sync head_object calls with aiobotocore: ', asyncio.run(async_boto(10, one_by_one=True)))
print('10 async head_object calls with aiobotocore: ', asyncio.run(async_boto(10, one_by_one=False)))
and here are the results I get (my connection is not that great, so feel free to give some room for deviance);
10 sync head_object calls with boto: 5.419796205000239
10 sync head_object calls with aiobotocore: 36.99419660199965
10 async head_object calls with aiobotocore: 3.782599554000626
I know that when running concurrently, it is quite nice though the use case for synchronously running them still exist. Any ideas about why there is so much of a difference in timing?
Checklist
- [x] I have reproduced in environment where
pip checkpasses without errors - [x] I have provided
pip freezeresults - [x] I have provided sample code or detailed way to reproduce
- [x] I have tried the same code in botocore to ensure this is an aiobotocore specific issue
- [ ] I have tried similar code in aiohttp to ensure this is is an aiobotocore specific issue
- [ ] I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection
pip freeze results
aiobotocore==1.2.1
Environment:
- Python Version:
3.8 - OS name and version:
ubuntu 20.04
that is indeed an interesting experiment. The performance will depend mostly on aiohttp, as that is what makes that actual calls. What version of aiohttp are you using? Your pip freeze results are incomplete. I'd try the same experiment but with requests vs aiohttp.
aiohttp==3.6.2 (the aiohttp is installed with [speedups]).
mind trying latest version of aiohttp? They fixed a blocking issue. Also mind running profiler like pyvmmonitor. It should provide insights where the slowness is. otherwise I can run it when I get a chance
There is definitely an improvement (tried to run 3 times with each operation) and the best to best comparison shows that it is reduced from 36 seconds to 30 seconds. Though that is still too slow (%600) compared to the sync one.
$ python t.py
10 sync head_object calls with boto: 5.458850416000132
10 sync head_object calls with aioboto: 30.85212171800049
10 async head_object calls with aioboto: 3.713436236999769
(.venv38) (Python 3.8.5+) [ 2:41ÖS ] [ isidentical@desktop:~ ]
$ pip freeze | grep aiohttp
aiohttp==3.7.4.post0
Here is the results (only calls async_boto(10, one_by_one=True)) (performed via yappi): https://gist.github.com/isidentical/316ea24961fe7bfead5a66eb5b3a8596
seems like it's not re-using the connection, note 31 calls to _UnixSelectorEventLoop.create_connection . I want to look into this when i get some time. If you're free is there any change in behavior if you explicitly do one after the other instead of one by one in a list of coros? Also another theory if you do a get_object instead of head_object and explicitly read the body that will guarantee that the connection gets re-used. If that works then it means the head_object op isn't marking the connection as re-usable in aiohttp. Would be interesting a test between requests + aiohttp with sync calls to head ops
Huh, get_object seems to be ~in the same speed with the boto implementation.
10 sync get_object calls with boto: 11.472657151000021
10 sync get_object calls with aioboto: 12.949970024999857
10 async get_object calls with aioboto: 1.9521049269997093
Hey @thehesiod any hints on how I could make the head_object calls to make the connections re-usable just like get_object? I searched in the code base though couldn't find any specific place where the get_object is treated differently than the head_object. Thanks.
ok let me debug this now, have a sec :]
ok so I think the problem is that when doing a head_object w/o the region of the bucket botocore first tries to do the op region-less, ex: https://bucket-name.s3.amazonaws.com/key_name which causes a 400, this in aiohttp causes it to close the connection. Later on a retry it does some more calls and adds the region like https://bucket-name.s3.us-west-2.amazonaws.com I think requests doesn't require the connection to be closed on a 400.
After setting the region and updating your test code to instead be:
def sync_boto3(times):
session = Boto3Session()
s3 = session.client("s3")
start = time.perf_counter()
for f_no in range(times):
s3.head_object(**get_kwargs())
end = time.perf_counter()
return end - start
def sync_botocore(times):
session = BotoSession()
s3 = session.create_client("s3")
start = time.perf_counter()
for f_no in range(times):
s3.head_object(**get_kwargs())
end = time.perf_counter()
return end - start
async def async_boto(times, one_by_one=False):
session = aiobotocore.AioSession()
async with session.create_client("s3") as s3:
start = time.perf_counter()
if one_by_one:
for _ in range(times):
r = await s3.head_object(**get_kwargs())
else:
await asyncio.gather(*[s3.head_object(**get_kwargs()) for _ in range(times)])
end = time.perf_counter()
return end - start
I get the following numbers:
10 sync head_object calls with boto3: 0.724141243
10 sync head_object calls with botocore: 0.8051326689999998
10 sync head_object calls with aiobotocore: 0.9191009279999998
10 async head_object calls with aiobotocore: 0.27092336600000033
which shows minimal overhead for aiobotocore.
If I remove the correct region from my ~/.aws/config file for the profile I'm using I get:
10 sync head_object calls with boto3: 1.416235907
10 sync head_object calls with botocore: 1.2566816360000002
10 sync head_object calls with aiobotocore: 7.900813063999999
10 async head_object calls with aiobotocore: 1.3000357299999994
which shows botocore gets much slower as well, with aiobotocore having an increased penalty since it discards this connection here: https://github.com/aio-libs/aiohttp/blob/3.7/aiohttp/client_proto.py#L229
Could you validate yourself? I think it's a bug that botocore is 400'ing, if it doesn't know the region it should first make the API call to find the region and then do the API call needed. But in general it's better for you to tell it the region to avoid the overhead.
note btw for quick ops aiobotocore won't help too much because it's creating a connection for each request
Could you validate yourself? I think it's a bug that botocore is 400'ing, if it doesn't know the region it should first make the API call to find the region and then do the API call needed. But in general it's better for you to tell it the region to avoid the overhead.
i see, and the explanation makes sense :/ do you know whether we can hack / patch something to avoid this, since it is a bit troubling for our use case. I'll also check the botocore though i guess it is a bit of a long shot.
you won't know the region of the bucket ahead of time? you can do the lookup yourself to solve this
The code is written in a way to you could work with buckets from multiple regions. It would have possible to implement an inline cache system to the class, and lookup every new bucket once they seen and resolve their region though for each bucket i have to recreate a session which might not be the best behavior (since I can't pass region=... to the head_object directly unlike Bucket= etc).
can you try using SigV4? That requires the region to be used so botocore may do the right thing: see https://github.com/boto/botocore/issues/2109#issuecomment-663267168
note the caveats though
we have experimented with switching to s3v4 as the default signature version (instead of letting boto decide) and received a lot of complaints about it creating the. signed url in a wrong way (using the wrong region, i assume the default one) so i don't prefer that. now thinking about it, we might experiment with multiple clients (not sessions) with different region_names.
something like this;
clients = {}
async def make_api_call(method, **kwargs):
if kwargs.get('Bucket'):
bucket = kwargs['Bucket']
if bucket not in clients:
'make headbucket call to retrieve the location constraints'
clients[bucket] = 'create async client with the x-amz-bucket-region'
client = clients[bucket]
else:
client = default_client (client without region set)
return await getattr(client, method)(**kwargs)
though i still need to actually implement a prototype and benchmark it. though it would've been much easier if i could simply reduce the cost of connection creation when something goes bad (e.g 400, as you have given as an example) since even though without a region botocore is getting slower (in your example it goes from 0.8 seconds to 1.2 seconds) it is making things way worse for aiobotocore (0.9 to 7.9).
remember clients are context objects because they hold onto connection pools, so patching it this low level is not recommended.
I just logged this: https://github.com/boto/botocore/issues/2393
Here's what I get if I set the region_name in each create_client:
10 sync head_object calls with boto3: 0.8160533059999999
10 sync head_object calls with botocore: 0.21341984199999997
10 sync head_object calls with aiobotocore: 0.8003279669999999
10 async head_object calls with aiobotocore: 0.252606906
I'm not sure about the difference between botocore + aiobotocore, will have to investigate. I did validate that new connections were not getting created, so there must be something about inefficiencies with async vs sync connections.
FYI I found their error code retry logic very slow, instead of aggregating the status code checking they have to navigate various classes to check one one one at a time. So it ends up calling like 20 functions or more for each non 200 response.
I recommend something like this:
from contextlib import AsyncExitStack
from typing import Dict, Any
class S3ClientRegionCache:
def __init__(self, session: aiobotocore.AioSession):
self._session = session
self._exit_stack = AsyncExitStack()
self._client = None
self._cache: Dict[str, Any] = dict()
async def get_bucket_client(self, bucket_name: str):
client = self._cache.get(bucket_name)
if client:
return client
if not self._client:
self._client = await self._exit_stack.enter_async_context(self._session.create_client("s3"))
response = await self._client.head_bucket(Bucket=bucket_name)
region = response['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']
client = self._cache[bucket_name] = self._exit_stack.enter_async_context(self._session.create_client("s3", region_name=region))
return client
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self._exit_stack.__aexit__(exc_type, exc_val, exc_tb)
and you can use like this:
session = aiobotocore.AioSession()
async with S3ClientRegionCache(session) as cache:
s3 = await cache.get_bucket_client(kwargs['Bucket'])
going to close for now as clients/sessions should be long lived. Please re-open if that does not help