aiobotocore icon indicating copy to clipboard operation
aiobotocore copied to clipboard

Async code in aiobotocore calls synchronous functions of boto3

Open zferentz opened this issue 2 years ago • 15 comments

Describe the bug While running some performance test (simple loop, code attached below) we believe that we found a problem where creating a session/resource in aioboto3 is calling an async function in aiobotocore which is calling a blocking function in boto3.

Code The following code demonstrates the problem - the more sessions/resources we create - the more time it takes which shows that it's not real async.

import asyncio
import datetime
import aioboto3

async def create_session_and_resource():
    session = aioboto3.Session()
    async with session.resource('dynamodb', region_name='us-east-1',
                                endpoint_url='http://fakehost:8123'
                                ) as dynamo_resource:
        pass

async def main(N):
    tasks = [create_session_and_resource() for _ in range(N)]
    t1 = datetime.datetime.now()
    await asyncio.gather(*tasks)
    t2 = datetime.datetime.now()
    print(f"N={N} total duration= {(t2 - t1).total_seconds()}")


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(2))
    loop.run_until_complete(main(10))
    loop.run_until_complete(main(50))

pip results

aioboto3                         8.0.5
aiobotocore                    1.0.4
boto3                              1.12.32
botocore                         1.15.32

Environment:

  • Python Version: 3.6.15 (similar behavior with Python3.8)
  • OS name and version: Ubuntu 20.04.1

Additional info/context We opened an issue with aioboto3 (https://github.com/terrycain/aioboto3/issues/254) but the assignee was right to ask us to bring it up here.

We believe that the block is happening because aiobotocore calls synchronous functions of boto3.

As an example, think of the coroutine named _create_client (implemented in aiobotocoro/session) which calls self._get_internal_component('endpoint_resolver') .

def _get_internal_component(self, name):
        # While this method may be called by botocore classes outside of the
        # Session, this method should **never** be used by a class that lives
        # outside of botocore.
        return self._internal_components.get_component(name)

which eventually hits the get_component method in botocore.session which executes the deffered (sync) function as can be seen here:

class ComponentLocator(object):
    """Service locator for session components."""
    def __init__(self):
        self._components = {}
        self._deferred = {}

    def get_component(self, name):
        if name in self._deferred:
            factory = self._deferred[name]
            self._components[name] = factory()
            # Only delete the component from the deferred dict after
            # successfully creating the object from the factory as well as
            # injecting the instantiated value into the _components dict.
            del self._deferred[name]
        try:
            return self._components[name]
        except KeyError:
            raise ValueError("Unknown component: %s" % name)

We wonder if it's possible to assure the asynchronicity of both the Session and the resource creation to eliminate the blocking. Our backend code creates many sessions/resources and we're trying to optimize its performance.

Thank you in advance !!

zferentz avatar May 16 '22 01:05 zferentz

Our backend code creates many sessions/resources

They do recommend keeping a single persistent session for long-running processes like servers. We did this a month ago and saw slightly better performance. Just my 2 cents

dacevedo12 avatar May 16 '22 02:05 dacevedo12

I'm not following, you want to make the factory() method async? it does very little work, doesn't make sense to make async. And yes as @dacevedo12 mentions, you should not create many sessions/clients, they should be long lived.

thehesiod avatar May 16 '22 17:05 thehesiod

@dacevedo12 - thanks for the feedback !!! i totally agree with you and we're working to update our system to use single Session object, I have a feeling that the problem is not with the Session but rather with creating the resource so i'm not sure that having long-lived session will solve the problem.

@thehesiod - that exactly what i initially thought, however it bothers me a bit that a nonbocking (async) function actually calls a blocking function and it blocks the entire ioloop. I'm just not sure if this is indeed the intended/designed behavior . I also admit that i tried to fix it but the code is complex , it's even hard for me to understand what's going on...

zferentz avatar May 16 '22 19:05 zferentz

@zferentz that's totally normal, you're not going to have all your code async, only that which makes blocking calls, or eventually makes blocking calls.

thehesiod avatar May 16 '22 23:05 thehesiod

@thehesiod that's my point - i think that the creating the session (or maybe it's the resource) makes a blocking call to boto3 which leads to a blocking I/O call (which shouldn't happen in an async call). I agree that using a single session/resource for the entire service will probably workaround the problem .

Either way, appreciate your effort and feedback !!

zferentz avatar May 17 '22 02:05 zferentz

@zferentz I don't see any blocking io calls, can you point that out?

thehesiod avatar May 17 '22 15:05 thehesiod

For example, i see that the call async with session.resource(...) calls self._loader.load_service_model(...) which calls botocore's list_available_services which eventually calls os.listdir twice and then checks if file exists using os.path.isfile .

The top level call is async however all the other calls are non async and since we call os.path - we are basically calling a blocking function that access the filesystem which is a blocking call.

Now, since the list_available_services is enumerating all the supported services, we end up with hundred calls to os.listdir and os.path.isfile .
Of course filesystem cache can help here but it's still an async call that calls multiple I/O blocking calls.

zferentz avatar May 17 '22 18:05 zferentz

@zferentz ya but there are no python built-in async functions for file io, we'd have to swap to using aiofiles or manually via threading which is non ideal. There really needs to be built-in python async file APIs for that to work as intended, otherwise you're spinning up a bunch of heavy-weight, GIL'd python threads. Further most people should be on cached SSDs which should make this rather trivial. If you're blocked on listdir, should inspect why the directory its listing has so many files.

thehesiod avatar May 18 '22 16:05 thehesiod

@thehesiod Thank you for the immediate feedback. First, I'm happy that i was not completely wrong and we do have an async code that calls blocking I/O functions. I wasnt 100% sure and i am happy that you see it as well.

Now, indeed Python doesnt provide a builtin mechanism for async filesystem operations but i wonder if we can optimize it by caching this data (as we can assume that the list of libraries/modules wont change throughout the lifetime of the application).

Either way,i do realize that there's no easy solution here.

zferentz avatar May 18 '22 20:05 zferentz

@zferentz it is cached, at the session/client level :), and people really shouldn't be re-creating sessions/clients often as there's no need. In terms of async code calling blocking io yes and no. file IO really should not be a problem on modern systems unless your filesystem is on a network, in which case probably caching should be controlled at the driver level, or, the code is doing something dumb (like I'd imagine that listfiles probably should be a glob with a filepattern instead, and if there are a bagillion files in the dir something is probably badly designed for that folder). Also caching is out of scope for aiobotocore as it's just async'ifying botocore. Caching belongs in the botocore project. Also globals=bad in general. Are you bringing this up due to a real world problem or theoretical? Without a real world scenario I can't give a good recommendation.

thehesiod avatar May 19 '22 00:05 thehesiod

@thehesiod Thanks again for your quick reply !! I do agree with everything you've mentioned but i think that caching as a simple dict class-member will speed up the performance for many use cases.

For example, in my example - one of our older libraries is creating many resources/sessions and for each one the loader more than 200 blocking system calls to the filesystem (which is indeed local, standard python ). The net result (on my system) is that the entire ioloop is blocked and i can see how other async operations (standard aiothttp calls) are waiting for the session-creation.

I guess if no one else raised this topic then it means that everyone indeed using single session for the entire app (which i hope to do in the future) or just people are not aware of this issue.

Either way, if you believe that there's no easy way to fix it - feel free to close this ticket . thanks a lot.

zferentz avatar May 19 '22 14:05 zferentz

@zferentz tbh I'm glad you reported this as we were also seeing some performance shenanigans but weren't really sure whether event loop skews due to blocking calls played a part in the problem.

For instance, I sometimes see aiobotocore+aiohttp requests to dynamodb taking seconds (for context, that's outrageously slow for non-scan dynamodb operations), whereas cloudwatch always reports times under 100ms. We're currently keeping a global long-lived session but still instancing the resource on every operation

dacevedo12 avatar May 19 '22 14:05 dacevedo12

FYI to track slow sync calls you can use something like this: https://gist.github.com/thehesiod/0d2beed27c800e4f7286be6c3cd17a09 That's what we have in production to ensure sync calls aren't blocking the main thread for too long

thehesiod avatar May 19 '22 15:05 thehesiod

@zferentz I think it would be great to speed up botocore, actually I proposed some changes to how they parse timestamp years ago that sped up our scenario 20% and they have yet to merge my change, so unfortunately getting them to change things can be glacial. However I suggest the following, given list_available_services is doing a file listing, and one wouldn't reasonably expect the service list to change during the lifetime of the app I think it would be one of the few instances where it would be worthwhile to cache in the service list in a global. I would make a monkeypatch to replace that method with one that caches globally, then write a small python test that shows the speed difference by running both versions for like 100 times or so. The open a botocore bug to propose your solution. This is really a botocore issue and not an aiobotocore issue IMO. Let me know if you disagree.

thehesiod avatar May 19 '22 15:05 thehesiod

and in either case can use your monkeypatch to speed up your use case :). Actually if this really does take a significant amount of time I'd like to use this monkeypatch at our company as well, or perhaps we could ship an optional set of speedups in aiobotocore. Please do report your test and findings here!

thehesiod avatar May 19 '22 15:05 thehesiod

ok going to close as this isn't really an issue with aiobotocore as 1) the performance issue is really an underlying botocore issue and 2) unfortunately we don't have batteries included async file support yet :(

feel free to re-open if I missed something.

thehesiod avatar Jul 07 '23 05:07 thehesiod