botocore icon indicating copy to clipboard operation
botocore copied to clipboard

Identical data is loaded into every session wasting memory

Open sparrowt opened this issue 2 years ago • 3 comments

Describe the bug

Each botocore Session creates its own instance of a Loader within which JSON content loaded from botocore/data/ is cached by @instance_cache e.g. on methods like load_service_model & load_data_with_path.

This caching applies to many things including loading endpoints.json into an EndpointResolver which happens in every session and results in approx 6 MB of memory allocation (to load details of all the HTTP endpoints for every region/partition for all 300+ AWS services).

The JSON files shipped with botocore presumably do not change on disk at runtime. Nevertheless if you create several sessions within a process - e.g. in a multi-threaded app because sessions are not thread safe - this exact same data is loaded into memory multiple times and cached separately in each Session's Loader and in its EndpointResolver.

It seems therefore like a bug (of the wasteful memory usage variety) that the immutable JSON cache is per-session rather than per-process. In a multi-threaded app in a resource-constrained environment every 6MB really adds up.

Expected Behavior

When creating a 2nd (and any subsequent) Session, the data which has already been loaded from endpoints.json should be re-used, quickly and without unnecessary extra memory allocation.

Current Behavior

Instead, each new session actually loads the whole thing in again resulting in another ~6 MB of memory usage each time, storing it in a new EndpointResolver (and Loader) with the new Session. (The same issue exists for other JSON data such as service definitions but I'm just focussing on the most common & most impactful example that I observed.)

Reproduction Steps

import boto3.session

# Note: in real usage the below would be in separate threads (otherwise we could just re-use the Session)
# but the threading code is omitted from this example for brevity and because it does not affect the repro

# In one thread
session = boto3.session.Session(region_name='us-east-1')  # +6mb
client = session.client('s3')  # (+5mb as it happens)
# do stuff with client

# In another thread
session = boto3.session.Session(region_name='us-east-1')  # +6mb again = REPRO: this shouldn't need to load endpoints.json again
client = session.client('someotherservice')
# do stuff with client

Possible Solution

One solution would be to make the Loader process-wide with suitable locking on state as necessary. I imagine the small extra overhead is more than paid for by the memory savings if many sessions/clients are created.

A more radical alternative would be for the pre-processing step that generates the botocore/data/ to spit out, instead of each JSON file, a python module (.py file) containing a dict with the same data. Then Loader doesn't have to load JSON, it just lazily imports the python files it needs and python's importlib gives you the process-wide sharing and thread safety for free. I imagine this would be a much more difficult change having seen the existence of things like CUSTOMER_DATA_PATH (~/.aws/models/) so it may not be feasible - but I've included it if nothing else for hypothetical comparison and to illustrate the principle of the problem.

Additional Information/Context

https://github.com/boto/boto3/issues/1670 is very related - this ticket is an attempt at a detailed description of why each session increases memory usage so much and how this might be avoided.

Any

SDK version used

botocore==1.33.1 boto3==1.33.1

Environment details (OS name and version, etc.)

Windows 10 Ubuntu WSL / same happens on Amazon Linux 2

sparrowt avatar Nov 28 '23 16:11 sparrowt

Hi @sparrowt thanks for reaching out. Have you tried sharing a single loader instance across several sessions? For example:

from botocore.loaders import Loader

loader = Loader()
sessions = some_func_that_makes_multiple_sessions()
for session in session:
    session.register_component('data_loader', loader)

Another option is using a single session to create multiple clients which get passed to the other threads:

session = boto3.session.Session(region_name='us-east-1')
client1 = session.client('s3')
client2 = session.client('someotherservice')

# In one thread
client1.do_something()

# In another thread
client2.do_something()

The endpoints.json file itself is relatively small and only a small fraction of what’s causing the memory usage. The suggestions you described could involve extensive refactoring and I can't guarantee that those changes would be considered. I think it would help to have memory profile reports here that highlight the current memory usage you're seeing and how it compares with the approaches provided above.

tim-finnigan avatar Dec 08 '23 18:12 tim-finnigan

Thanks so much for getting back to me @tim-finnigan. I have not tried that, I assumed Loader was not thread safe (otherwise why would each session need its own?) - before I do could you clarify a couple of things:

  1. is Loader thread safe?
  2. if so, is there any reason not to make this the default behaviour? (i.e. all sessions using the same 'data_loader' component, I guess unless they specify a non-default 'data_path')

To respond to some of your other points:

Another option is using a single session to create multiple clients which get passed to the other threads

Sadly this is not really an option in my case: the app in question is a multi-threaded web server and it is not possible to predict in advance which boto3 clients any given thread might need, so because Session is not thread safe, each thread has to create its own session in order to create the client(s) it needs. I am already caching that session using threading.local() so that subsequent client creations in the same thread don't need to make another session.

The endpoints.json file itself is relatively small and only a small fraction of what’s causing the memory usage.

It is 781KB (only surpassed by a handful of the service definitions) however loading it into memory in python results in nearly 6 MB of memory allocation according to analysis using the Austin profiler e.g. in the memory allocation profile trace below where I did session = boto3.session.Session(region_name='us-east-1') and then client = session.client('s3') you see that within create_default_resolver where it loads endpoints.json there is 5.86 MB of memory allocation, which is quite a large fraction of the total memory allocated, the other major parts (to the right in the trace below) being smaller and s3 specific _load_service_model (4.71 MB) and _load_service_endpoints_ruleset (2.05 MB): image

sparrowt avatar Dec 11 '23 15:12 sparrowt

Have you tried sharing a single loader instance across several sessions? For example: [snipped]

Found that each call of boto3.session.Session() (not using the default session) eat 200ms of wall clock time which lead me to this issue after I noticed all the stat/JSON parsing in my profiling; so not a RAM problem per se for me but a really expensive set up time.

The example was not exactly clear to me but did point me in the right direction on what I should be trying, so as a note for others, this got my ~200ms down to ~20ms per call to create and S3 Resource:

# preload and reuse the model to shave ~200ms each time we create a session
# https://github.com/boto/boto3/issues/1670
# https://github.com/boto/botocore/issues/3078
_loader = botocore.loaders.Loader()
# iterate contents of botocore/data/s3
for type_name in frozenset(['endpoint-rule-set-1', 'paginators-1', 'service-2', 'waiters-2']):
    _loader.load_service_model(service_name='s3', type_name=type_name)
# session *instantiation* is not safe either
_boto_session_lock = threading.Lock()
def _session():

    session = botocore.session.get_session()
    session.register_component('data_loader', _loader)
    with _boto_session_lock:
        return boto3.session.Session(region_name=region_name, botocore_session=session)

Then in your threads later you can use the following for a significantly faster setup time:

session = _session()
#session.events.register(...)
resource = session.resource('s3', config=config, endpoint_url=AWS_ENDPOINT_URL)

jimdigriz avatar Jan 27 '24 12:01 jimdigriz

Hi @tim-finnigan,

I'm also facing this issue in one of my Web project with 30 working threads. Having one session (each one with its loader instance) per thread leads to huge memory consumption for the whole application.

As far as I can see, the Loader object is thread-safe (it mainly contains read-only properties and does not have changing states). As a consequence, building a new loader with the same search paths for each session lead to loading the exact same results for each session with the penalty of duplicating its memory footprint.

Also, we can see that Loader properties are cached, so we clearly don't expect loaded files to change during program execution and loader results to update on files changes. For this reason, I think that having a Loader singleton for a given set of its constructor parameters would be perfectly fine and every session would share the same loaded models.

All this can be easily achieved by adding a @functools.cache decorator to the create_loader function, which is what sessions use to create their Loader Session._register_data_loader

Here is a little test:

Without caching

import resource
import time
from boto3 import Session


def test():
    sessions = []
    connections = []
    print("Init:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
    
    for i in range(30):
        s = time.monotonic()
        session = Session(aws_access_key_id="****", aws_secret_access_key="****")
        e = time.monotonic()
        sessions.append(session)
        print(f"Session {i} created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
        s = time.monotonic()
        conn = session.resource("s3", use_ssl=True, endpoint_url="https://****", verify=True)
        e = time.monotonic()
        connections.append(conn)
        print(f"    Connection created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
    print("End:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Running test() leads to the following output:

Init: 30812
Session 0 created (0.007366713999999774): 31136
    Connection created (0.20709648000001835): 45172
Session 1 created (0.0077458139999180275): 45172
    Connection created (0.05676860499988834): 59428
Session 2 created (0.005723010000110662): 59692
    Connection created (0.05713270499973078): 70252
Session 3 created (0.006202011999903334): 70516
    Connection created (0.06279741499974989): 81340
Session 4 created (0.005386410000028263): 81340
    Connection created (0.05483890000004976): 92164
Session 5 created (0.005307609000283264): 92164
    Connection created (0.05163619499990091): 102988
Session 6 created (0.08651785899974129): 102988
    Connection created (0.06467631899977277): 113812
Session 7 created (0.005154008999852522): 114076
    Connection created (0.06136341299998094): 124636
Session 8 created (0.005280109000068478): 124900
    Connection created (0.054148298999734834): 135460
Session 9 created (0.005269808999855741): 135724
    Connection created (0.04595588400025008): 146548
Session 10 created (0.011273919999894133): 146548
    Connection created (0.048401388000002044): 157372
Session 11 created (0.00553420999995069): 157372
    Connection created (0.06633752100015045): 168196
Session 12 created (0.005896810999729496): 168196
    Connection created (0.06273591500030307): 179020
Session 13 created (0.006891513000027771): 179020
    Connection created (0.06945342699964385): 189844
Session 14 created (0.016072529000211944): 189844
    Connection created (0.05730550400039647): 200668
Session 15 created (0.0051255090002086945): 200932
    Connection created (0.06524571999989348): 211492
Session 16 created (0.0061956109998391184): 211756
    Connection created (0.054940600999998424): 222316
Session 17 created (0.00536960999988878): 222580
    Connection created (0.04944039000019984): 233140
Session 18 created (0.011485020999771223): 233404
    Connection created (0.0556549019997874): 244228
Session 19 created (0.005821011000080034): 244228
    Connection created (0.07185963199981416): 255052
Session 20 created (0.006414611999844055): 255052
    Connection created (0.3373863169999822): 265876
Session 21 created (0.010485518999757915): 265876
    Connection created (0.06670962199996211): 276700
Session 22 created (0.00575131000005058): 276700
    Connection created (0.06228251399988949): 287524
Session 23 created (0.005410010000105103): 287524
    Connection created (0.0657683200001884): 298348
Session 24 created (0.005721811000057642): 298612
    Connection created (0.060366809999777615): 309172
Session 25 created (0.005479909999849042): 309436
    Connection created (0.05271927500007223): 319996
Session 26 created (0.005093302999739535): 320260
    Connection created (0.061812045000351645): 330820
Session 27 created (0.005705203999696096): 331084
    Connection created (0.06998764999980267): 341908
Session 28 created (0.007875706000049831): 341908
    Connection created (0.060424943000271014): 352732
Session 29 created (0.0057177039998350665): 352732
    Connection created (0.04947113599973818): 363556
End: 363556

We can see that each time a new Session/Connection is created, it get 50 to 200ms to initiate the connection : this is essentially time used by the Loader to load JSON models, the second and next ones are shorter thanks to file caching. We also see that, memory increases from 30 812 KB at the start of test to 363 556 KB at the end (11 092 KB per Session/Connection)

After decorating the create_loader function with @functools.cache

The result of test() execution is

Init: 30980
Session 0 created (0.008385603000078845): 30980
    Connection created (0.2360037920002469): 45164
Session 1 created (0.005130701999860321): 45164
    Connection created (0.003696001000207616): 45692
Session 2 created (0.004956201999902987): 45692
    Connection created (0.004107302000193158): 46220
Session 3 created (0.005574302000241005): 46220
    Connection created (0.0032814010000947746): 46484
Session 4 created (0.004767402000197762): 46748
    Connection created (0.003917602000001352): 47012
Session 5 created (0.0049445019999438955): 47012
    Connection created (0.0032219009999607806): 47540
Session 6 created (0.005483102000198414): 47540
    Connection created (0.0061228019999362004): 48068
Session 7 created (0.005617702000108693): 48068
    Connection created (0.004165200999977969): 48332
Session 8 created (0.006076702999962436): 48596
    Connection created (0.004040900999825681): 48860
Session 9 created (0.005701903000044695): 49124
    Connection created (0.032456312000249454): 49388
Session 10 created (0.008747503999984474): 49652
    Connection created (0.004532501000085176): 49916
Session 11 created (0.006458903000293503): 49916
    Connection created (0.004456302000107826): 50444
Session 12 created (0.005392002000007778): 50444
    Connection created (0.003272000999913871): 50972
Session 13 created (0.005266902000130358): 50972
    Connection created (0.0037011019999226846): 51500
Session 14 created (0.004815901000256417): 51500
    Connection created (0.0032329019995813724): 52028
Session 15 created (0.0050092020001102355): 52028
    Connection created (0.00357710100024633): 52556
Session 16 created (0.004979501999969216): 52556
    Connection created (0.0030680010004289215): 53084
Session 17 created (0.0049238020001212135): 53084
    Connection created (0.0035945019999417127): 53612
Session 18 created (0.005164102000435378): 53612
    Connection created (0.003311600999950315): 53876
Session 19 created (0.005232101999808947): 54140
    Connection created (0.0031690010000602342): 54404
Session 20 created (0.005990101999941544): 54668
    Connection created (0.003229601999919396): 54932
Session 21 created (0.0054135020000103395): 55196
    Connection created (0.003699100999710936): 55460
Session 22 created (0.005547002000184875): 55724
    Connection created (0.003502602000025945): 55988
Session 23 created (0.00545880199979365): 56252
    Connection created (0.003561700999853201): 56516
Session 24 created (0.0048902019998422475): 56780
    Connection created (0.003871401999731461): 57044
Session 25 created (0.005137202000241814): 57044
    Connection created (0.003650901000128215): 57572
Session 26 created (0.005528502000288427): 57572
    Connection created (0.005458303000068554): 58100
Session 27 created (0.005486201999701734): 58100
    Connection created (0.0036526009998851805): 58628
Session 28 created (0.005462101999910374): 58628
    Connection created (0.003915201999916462): 59156
Session 29 created (0.004943802000070718): 59156
    Connection created (0.0032232009998551803): 59684
End: 59684

We can see that not only the creation of second and next Sessions/Connections is faster (4 to 6ms instead of ~50ms) but the final memory consumption is now 59 684 KB (957 KB per thread). That is a saving of 91% of the Session/Connection memory consumption.

The following test demonstrates usage in a multi-thread context:

import random
import resource
import time
from concurrent.futures import ThreadPoolExecutor
from boto3 import Session


def _in_thread(i):
    time.sleep(random.random() * 10)  # simulate the fact that first Web request for each thread won't arrive at the same time
    s = time.monotonic()
    session = Session(aws_access_key_id="****", aws_secret_access_key="****")
    e = time.monotonic()
    print(f"Session {i} created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
    s = time.monotonic()
    conn = session.resource("s3", use_ssl=True, endpoint_url="https://****", verify=True)
    e = time.monotonic()
    print(f"    Connection {i} created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
    time.sleep(5)
    

def test_threads():
    with ThreadPoolExecutor(max_workers=30) as tp:
        print("Init:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
        
        for i in range(30):
            tp.submit(_in_thread, i)
    
    print("End:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Without cache

Init: 31112
Session 3 created (0.00922908099983033): 39620
Session 20 created (0.006568328999492223): 39620
    Connection 20 created (0.3105509950000851): 65500
    Connection 3 created (0.344620763000421): 66152
Session 21 created (0.011317221999888716): 66152
Session 1 created (0.006422125999961281): 74596
    Connection 21 created (0.163153601999511): 93864
    Connection 1 created (0.13669648300037807): 95448
Session 16 created (0.006645530999776383): 95448
    Connection 16 created (0.0640196559998003): 109964
Session 17 created (0.007205042000350659): 109964
    Connection 17 created (0.0562731049994909): 124480
Session 28 created (0.020986112000173307): 124480
    Connection 28 created (0.1400560480005879): 138996
Session 10 created (0.007301143000404409): 138996
    Connection 10 created (0.07487037000009877): 153776
Session 27 created (0.008741670999370399): 153776
    Connection 27 created (0.06051018699963606): 168292
Session 19 created (0.005984718000036082): 168292
    Connection 19 created (0.0524720299999899): 182808
Session 7 created (0.008431165999354562): 182808
    Connection 7 created (0.06080749300053867): 197324
Session 14 created (0.01610251599959156): 197324
    Connection 14 created (0.07909605199984071): 211840
Session 5 created (0.007650550000107614): 211840
    Connection 5 created (0.06343064499924367): 226356
Session 22 created (0.023409358999742835): 226356
    Connection 22 created (0.06664130799981649): 240872
Session 4 created (0.006029619000400999): 240872
    Connection 4 created (0.07310243400024774): 255388
Session 26 created (0.01154692599993723): 255872
    Connection 26 created (0.07243352099976619): 270108
Session 25 created (0.006844534000265412): 270108
    Connection 25 created (0.055366586999298306): 284868
Session 11 created (0.007529647999945155): 284868
    Connection 11 created (0.06344944500051497): 299344
Session 18 created (0.005848013999639079): 299344
    Connection 18 created (0.08036417699986487): 313860
Session 9 created (0.008084058999884292): 314112
    Connection 9 created (0.06677820000004431): 328564
Session 2 created (0.006900932999997167): 328564
    Connection 2 created (0.05366153300019505): 343040
Session 23 created (0.01589870600037102): 343280
Session 13 created (0.008930171999963932): 343500
    Connection 23 created (0.5349255040000571): 343500
    Connection 13 created (0.18205730699992273): 343500
Session 29 created (0.010847108999769262): 343500
    Connection 29 created (0.057455307000054745): 344056
Session 12 created (0.00756914500016137): 344056
    Connection 12 created (0.06418953599950328): 346908
Session 8 created (0.004972495999936655): 346908
    Connection 8 created (0.06857022100030008): 351656
Session 0 created (0.008681567999701656): 351656
    Connection 0 created (0.06100977499954752): 358780
Session 24 created (0.005650809000144363): 358780
Session 15 created (0.005491906000315794): 367224
    Connection 24 created (0.11594003299978795): 381932
    Connection 15 created (0.1103509260001374): 388004
Session 6 created (0.006968434000555135): 388004
    Connection 6 created (0.06429383799968491): 402520
End: 402844

371 732 KB memory increase for 30 threads (12 391 KB per thread)

With cache

Init: 31088
Session 25 created (0.02544531300009112): 39592
Session 8 created (0.00867557499987015): 39592
Session 23 created (0.005164503999367298): 64660
Session 7 created (0.00504450199969142): 64660
    Connection 25 created (0.5942038679995676): 64660
    Connection 23 created (0.34247499799948855): 64856
    Connection 7 created (0.3401085499999681): 65120
    Connection 8 created (0.5922448290002649): 65836
Session 17 created (0.008868378999977722): 65836
    Connection 17 created (0.005207504999816592): 66360
Session 26 created (0.008478669999931299): 66360
    Connection 26 created (0.0056940149997899425): 66884
Session 10 created (0.007501451000280213): 66884
    Connection 10 created (0.0050908019993585185): 67408
Session 4 created (0.025724717999764835): 67408
    Connection 4 created (0.009540091999951983): 67932
Session 9 created (0.02167163600006461): 67932
    Connection 9 created (0.04880168300041987): 68456
Session 29 created (0.007917160000033618): 68456
    Connection 29 created (0.00423368500014476): 68980
Session 12 created (0.025886121000439744): 68980
    Connection 12 created (0.011322327999550907): 69504
Session 19 created (0.008446269999694778): 69504
    Connection 19 created (0.006079622000470408): 70028
Session 5 created (0.04109294099998806): 70028
Session 24 created (0.03339178400074161): 70028
    Connection 5 created (0.010130708000360755): 70812
    Connection 24 created (0.00694824200036237): 71076
Session 28 created (0.015083008999681624): 71076
    Connection 28 created (0.010458413999913319): 71600
Session 14 created (0.022487161000753986): 71600
    Connection 14 created (0.01104862600004708): 72124
Session 6 created (0.008993284000098356): 72124
    Connection 6 created (0.008636976999696344): 72648
Session 21 created (0.02140433799922903): 73044
    Connection 21 created (0.008903882000595331): 73044
Session 15 created (0.012625958000171522): 73044
    Connection 15 created (0.005347808999431436): 73044
Session 16 created (0.010092006000377296): 73044
    Connection 16 created (0.007994564000000537): 73044
Session 1 created (0.008494573999996646): 73044
    Connection 1 created (0.005168704999960028): 73044
Session 0 created (0.014461996000136423): 73044
    Connection 0 created (0.008165866999661375): 73044
Session 3 created (0.01992350799991982): 73044
    Connection 3 created (0.008438672000011138): 73044
Session 27 created (0.008350270999471832): 73044
    Connection 27 created (0.005324409000422747): 73044
Session 2 created (0.008396872000048461): 73044
    Connection 2 created (0.0055756140000085): 73044
Session 20 created (0.007140045999221911): 73044
    Connection 20 created (0.0044659919994956): 73044
Session 11 created (0.011294531000203278): 73044
    Connection 11 created (0.006661236000581994): 73044
Session 13 created (0.022734365000360413): 73044
    Connection 13 created (0.0069842430002609035): 73044
Session 18 created (0.007411251999656088): 73044
    Connection 18 created (0.005829318999531097): 73044
Session 22 created (0.014411694999580504): 73044
    Connection 22 created (0.006717636999383103): 73164
End: 73952

42 864 KB memory increase (1429 KB per thread). This is a 88% memory saving compared to non-cache version.

Note : we can see that the 4 first threads took longer than the others for creating their connection and that memory saving is not as much as on sequential test. This is because @functools.cache does not uses lock when creating and caching objects. For this reason, when two threads calls the create_loader at the exact same time, this can lead to creation of 2 distinct loaders before the last one is cached and then returned to subsequent threads. This is someting that could be improved by using a cache function with a lock that ensures we only create one single Loader, but I'm not sure it's worth it.

Would you mind consider this feature request and accept a PR ?

antoinehumbert avatar Feb 27 '25 07:02 antoinehumbert

stumbled across this issue while investigating a memory leak and wondered why there are so many really big strings holding the same information loaded by botocore sessions

paulgueltekin avatar Mar 02 '25 01:03 paulgueltekin

I've posted a reply to the mentioned tracking issue (https://github.com/boto/boto3/issues/1670#issuecomment-2960687749). I'm going to close this issue as a duplicate, and I encourage you to follow-up on this in the tracking issue.

RyanFitzSimmonsAK avatar Jun 10 '25 22:06 RyanFitzSimmonsAK

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Jun 10 '25 22:06 github-actions[bot]

Thank you - I have replied there as requested, albeit I'm unconvinced.

sparrowt avatar Jun 23 '25 09:06 sparrowt