Identical data is loaded into every session wasting memory
Describe the bug
Each botocore Session creates its own instance of a Loader within which JSON content loaded from botocore/data/ is cached by @instance_cache e.g. on methods like load_service_model & load_data_with_path.
This caching applies to many things including loading endpoints.json into an EndpointResolver which happens in every session and results in approx 6 MB of memory allocation (to load details of all the HTTP endpoints for every region/partition for all 300+ AWS services).
The JSON files shipped with botocore presumably do not change on disk at runtime. Nevertheless if you create several sessions within a process - e.g. in a multi-threaded app because sessions are not thread safe - this exact same data is loaded into memory multiple times and cached separately in each Session's Loader and in its EndpointResolver.
It seems therefore like a bug (of the wasteful memory usage variety) that the immutable JSON cache is per-session rather than per-process. In a multi-threaded app in a resource-constrained environment every 6MB really adds up.
Expected Behavior
When creating a 2nd (and any subsequent) Session, the data which has already been loaded from endpoints.json should be re-used, quickly and without unnecessary extra memory allocation.
Current Behavior
Instead, each new session actually loads the whole thing in again resulting in another ~6 MB of memory usage each time, storing it in a new EndpointResolver (and Loader) with the new Session. (The same issue exists for other JSON data such as service definitions but I'm just focussing on the most common & most impactful example that I observed.)
Reproduction Steps
import boto3.session
# Note: in real usage the below would be in separate threads (otherwise we could just re-use the Session)
# but the threading code is omitted from this example for brevity and because it does not affect the repro
# In one thread
session = boto3.session.Session(region_name='us-east-1') # +6mb
client = session.client('s3') # (+5mb as it happens)
# do stuff with client
# In another thread
session = boto3.session.Session(region_name='us-east-1') # +6mb again = REPRO: this shouldn't need to load endpoints.json again
client = session.client('someotherservice')
# do stuff with client
Possible Solution
One solution would be to make the Loader process-wide with suitable locking on state as necessary. I imagine the small extra overhead is more than paid for by the memory savings if many sessions/clients are created.
A more radical alternative would be for the pre-processing step that generates the botocore/data/ to spit out, instead of each JSON file, a python module (.py file) containing a dict with the same data. Then Loader doesn't have to load JSON, it just lazily imports the python files it needs and python's importlib gives you the process-wide sharing and thread safety for free. I imagine this would be a much more difficult change having seen the existence of things like CUSTOMER_DATA_PATH (~/.aws/models/) so it may not be feasible - but I've included it if nothing else for hypothetical comparison and to illustrate the principle of the problem.
Additional Information/Context
https://github.com/boto/boto3/issues/1670 is very related - this ticket is an attempt at a detailed description of why each session increases memory usage so much and how this might be avoided.
Any
SDK version used
botocore==1.33.1 boto3==1.33.1
Environment details (OS name and version, etc.)
Windows 10 Ubuntu WSL / same happens on Amazon Linux 2
Hi @sparrowt thanks for reaching out. Have you tried sharing a single loader instance across several sessions? For example:
from botocore.loaders import Loader
loader = Loader()
sessions = some_func_that_makes_multiple_sessions()
for session in session:
session.register_component('data_loader', loader)
Another option is using a single session to create multiple clients which get passed to the other threads:
session = boto3.session.Session(region_name='us-east-1')
client1 = session.client('s3')
client2 = session.client('someotherservice')
# In one thread
client1.do_something()
# In another thread
client2.do_something()
The endpoints.json file itself is relatively small and only a small fraction of what’s causing the memory usage. The suggestions you described could involve extensive refactoring and I can't guarantee that those changes would be considered. I think it would help to have memory profile reports here that highlight the current memory usage you're seeing and how it compares with the approaches provided above.
Thanks so much for getting back to me @tim-finnigan. I have not tried that, I assumed Loader was not thread safe (otherwise why would each session need its own?) - before I do could you clarify a couple of things:
- is Loader thread safe?
- if so, is there any reason not to make this the default behaviour? (i.e. all sessions using the same 'data_loader' component, I guess unless they specify a non-default 'data_path')
To respond to some of your other points:
Another option is using a single session to create multiple clients which get passed to the other threads
Sadly this is not really an option in my case: the app in question is a multi-threaded web server and it is not possible to predict in advance which boto3 clients any given thread might need, so because Session is not thread safe, each thread has to create its own session in order to create the client(s) it needs. I am already caching that session using threading.local() so that subsequent client creations in the same thread don't need to make another session.
The endpoints.json file itself is relatively small and only a small fraction of what’s causing the memory usage.
It is 781KB (only surpassed by a handful of the service definitions) however loading it into memory in python results in nearly 6 MB of memory allocation according to analysis using the Austin profiler e.g. in the memory allocation profile trace below where I did session = boto3.session.Session(region_name='us-east-1') and then client = session.client('s3') you see that within create_default_resolver where it loads endpoints.json there is 5.86 MB of memory allocation, which is quite a large fraction of the total memory allocated, the other major parts (to the right in the trace below) being smaller and s3 specific _load_service_model (4.71 MB) and _load_service_endpoints_ruleset (2.05 MB):
Have you tried sharing a single loader instance across several sessions? For example: [snipped]
Found that each call of boto3.session.Session() (not using the default session) eat 200ms of wall clock time which lead me to this issue after I noticed all the stat/JSON parsing in my profiling; so not a RAM problem per se for me but a really expensive set up time.
The example was not exactly clear to me but did point me in the right direction on what I should be trying, so as a note for others, this got my ~200ms down to ~20ms per call to create and S3 Resource:
# preload and reuse the model to shave ~200ms each time we create a session
# https://github.com/boto/boto3/issues/1670
# https://github.com/boto/botocore/issues/3078
_loader = botocore.loaders.Loader()
# iterate contents of botocore/data/s3
for type_name in frozenset(['endpoint-rule-set-1', 'paginators-1', 'service-2', 'waiters-2']):
_loader.load_service_model(service_name='s3', type_name=type_name)
# session *instantiation* is not safe either
_boto_session_lock = threading.Lock()
def _session():
session = botocore.session.get_session()
session.register_component('data_loader', _loader)
with _boto_session_lock:
return boto3.session.Session(region_name=region_name, botocore_session=session)
Then in your threads later you can use the following for a significantly faster setup time:
session = _session()
#session.events.register(...)
resource = session.resource('s3', config=config, endpoint_url=AWS_ENDPOINT_URL)
Hi @tim-finnigan,
I'm also facing this issue in one of my Web project with 30 working threads. Having one session (each one with its loader instance) per thread leads to huge memory consumption for the whole application.
As far as I can see, the Loader object is thread-safe (it mainly contains read-only properties and does not have changing states). As a consequence, building a new loader with the same search paths for each session lead to loading the exact same results for each session with the penalty of duplicating its memory footprint.
Also, we can see that Loader properties are cached, so we clearly don't expect loaded files to change during program execution and loader results to update on files changes. For this reason, I think that having a Loader singleton for a given set of its constructor parameters would be perfectly fine and every session would share the same loaded models.
All this can be easily achieved by adding a @functools.cache decorator to the create_loader function, which is what sessions use to create their Loader Session._register_data_loader
Here is a little test:
Without caching
import resource
import time
from boto3 import Session
def test():
sessions = []
connections = []
print("Init:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
for i in range(30):
s = time.monotonic()
session = Session(aws_access_key_id="****", aws_secret_access_key="****")
e = time.monotonic()
sessions.append(session)
print(f"Session {i} created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
s = time.monotonic()
conn = session.resource("s3", use_ssl=True, endpoint_url="https://****", verify=True)
e = time.monotonic()
connections.append(conn)
print(f" Connection created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
print("End:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
Running test() leads to the following output:
Init: 30812
Session 0 created (0.007366713999999774): 31136
Connection created (0.20709648000001835): 45172
Session 1 created (0.0077458139999180275): 45172
Connection created (0.05676860499988834): 59428
Session 2 created (0.005723010000110662): 59692
Connection created (0.05713270499973078): 70252
Session 3 created (0.006202011999903334): 70516
Connection created (0.06279741499974989): 81340
Session 4 created (0.005386410000028263): 81340
Connection created (0.05483890000004976): 92164
Session 5 created (0.005307609000283264): 92164
Connection created (0.05163619499990091): 102988
Session 6 created (0.08651785899974129): 102988
Connection created (0.06467631899977277): 113812
Session 7 created (0.005154008999852522): 114076
Connection created (0.06136341299998094): 124636
Session 8 created (0.005280109000068478): 124900
Connection created (0.054148298999734834): 135460
Session 9 created (0.005269808999855741): 135724
Connection created (0.04595588400025008): 146548
Session 10 created (0.011273919999894133): 146548
Connection created (0.048401388000002044): 157372
Session 11 created (0.00553420999995069): 157372
Connection created (0.06633752100015045): 168196
Session 12 created (0.005896810999729496): 168196
Connection created (0.06273591500030307): 179020
Session 13 created (0.006891513000027771): 179020
Connection created (0.06945342699964385): 189844
Session 14 created (0.016072529000211944): 189844
Connection created (0.05730550400039647): 200668
Session 15 created (0.0051255090002086945): 200932
Connection created (0.06524571999989348): 211492
Session 16 created (0.0061956109998391184): 211756
Connection created (0.054940600999998424): 222316
Session 17 created (0.00536960999988878): 222580
Connection created (0.04944039000019984): 233140
Session 18 created (0.011485020999771223): 233404
Connection created (0.0556549019997874): 244228
Session 19 created (0.005821011000080034): 244228
Connection created (0.07185963199981416): 255052
Session 20 created (0.006414611999844055): 255052
Connection created (0.3373863169999822): 265876
Session 21 created (0.010485518999757915): 265876
Connection created (0.06670962199996211): 276700
Session 22 created (0.00575131000005058): 276700
Connection created (0.06228251399988949): 287524
Session 23 created (0.005410010000105103): 287524
Connection created (0.0657683200001884): 298348
Session 24 created (0.005721811000057642): 298612
Connection created (0.060366809999777615): 309172
Session 25 created (0.005479909999849042): 309436
Connection created (0.05271927500007223): 319996
Session 26 created (0.005093302999739535): 320260
Connection created (0.061812045000351645): 330820
Session 27 created (0.005705203999696096): 331084
Connection created (0.06998764999980267): 341908
Session 28 created (0.007875706000049831): 341908
Connection created (0.060424943000271014): 352732
Session 29 created (0.0057177039998350665): 352732
Connection created (0.04947113599973818): 363556
End: 363556
We can see that each time a new Session/Connection is created, it get 50 to 200ms to initiate the connection : this is essentially time used by the Loader to load JSON models, the second and next ones are shorter thanks to file caching. We also see that, memory increases from 30 812 KB at the start of test to 363 556 KB at the end (11 092 KB per Session/Connection)
After decorating the create_loader function with @functools.cache
The result of test() execution is
Init: 30980
Session 0 created (0.008385603000078845): 30980
Connection created (0.2360037920002469): 45164
Session 1 created (0.005130701999860321): 45164
Connection created (0.003696001000207616): 45692
Session 2 created (0.004956201999902987): 45692
Connection created (0.004107302000193158): 46220
Session 3 created (0.005574302000241005): 46220
Connection created (0.0032814010000947746): 46484
Session 4 created (0.004767402000197762): 46748
Connection created (0.003917602000001352): 47012
Session 5 created (0.0049445019999438955): 47012
Connection created (0.0032219009999607806): 47540
Session 6 created (0.005483102000198414): 47540
Connection created (0.0061228019999362004): 48068
Session 7 created (0.005617702000108693): 48068
Connection created (0.004165200999977969): 48332
Session 8 created (0.006076702999962436): 48596
Connection created (0.004040900999825681): 48860
Session 9 created (0.005701903000044695): 49124
Connection created (0.032456312000249454): 49388
Session 10 created (0.008747503999984474): 49652
Connection created (0.004532501000085176): 49916
Session 11 created (0.006458903000293503): 49916
Connection created (0.004456302000107826): 50444
Session 12 created (0.005392002000007778): 50444
Connection created (0.003272000999913871): 50972
Session 13 created (0.005266902000130358): 50972
Connection created (0.0037011019999226846): 51500
Session 14 created (0.004815901000256417): 51500
Connection created (0.0032329019995813724): 52028
Session 15 created (0.0050092020001102355): 52028
Connection created (0.00357710100024633): 52556
Session 16 created (0.004979501999969216): 52556
Connection created (0.0030680010004289215): 53084
Session 17 created (0.0049238020001212135): 53084
Connection created (0.0035945019999417127): 53612
Session 18 created (0.005164102000435378): 53612
Connection created (0.003311600999950315): 53876
Session 19 created (0.005232101999808947): 54140
Connection created (0.0031690010000602342): 54404
Session 20 created (0.005990101999941544): 54668
Connection created (0.003229601999919396): 54932
Session 21 created (0.0054135020000103395): 55196
Connection created (0.003699100999710936): 55460
Session 22 created (0.005547002000184875): 55724
Connection created (0.003502602000025945): 55988
Session 23 created (0.00545880199979365): 56252
Connection created (0.003561700999853201): 56516
Session 24 created (0.0048902019998422475): 56780
Connection created (0.003871401999731461): 57044
Session 25 created (0.005137202000241814): 57044
Connection created (0.003650901000128215): 57572
Session 26 created (0.005528502000288427): 57572
Connection created (0.005458303000068554): 58100
Session 27 created (0.005486201999701734): 58100
Connection created (0.0036526009998851805): 58628
Session 28 created (0.005462101999910374): 58628
Connection created (0.003915201999916462): 59156
Session 29 created (0.004943802000070718): 59156
Connection created (0.0032232009998551803): 59684
End: 59684
We can see that not only the creation of second and next Sessions/Connections is faster (4 to 6ms instead of ~50ms) but the final memory consumption is now 59 684 KB (957 KB per thread). That is a saving of 91% of the Session/Connection memory consumption.
The following test demonstrates usage in a multi-thread context:
import random
import resource
import time
from concurrent.futures import ThreadPoolExecutor
from boto3 import Session
def _in_thread(i):
time.sleep(random.random() * 10) # simulate the fact that first Web request for each thread won't arrive at the same time
s = time.monotonic()
session = Session(aws_access_key_id="****", aws_secret_access_key="****")
e = time.monotonic()
print(f"Session {i} created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
s = time.monotonic()
conn = session.resource("s3", use_ssl=True, endpoint_url="https://****", verify=True)
e = time.monotonic()
print(f" Connection {i} created ({e - s}):", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
time.sleep(5)
def test_threads():
with ThreadPoolExecutor(max_workers=30) as tp:
print("Init:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
for i in range(30):
tp.submit(_in_thread, i)
print("End:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
Without cache
Init: 31112
Session 3 created (0.00922908099983033): 39620
Session 20 created (0.006568328999492223): 39620
Connection 20 created (0.3105509950000851): 65500
Connection 3 created (0.344620763000421): 66152
Session 21 created (0.011317221999888716): 66152
Session 1 created (0.006422125999961281): 74596
Connection 21 created (0.163153601999511): 93864
Connection 1 created (0.13669648300037807): 95448
Session 16 created (0.006645530999776383): 95448
Connection 16 created (0.0640196559998003): 109964
Session 17 created (0.007205042000350659): 109964
Connection 17 created (0.0562731049994909): 124480
Session 28 created (0.020986112000173307): 124480
Connection 28 created (0.1400560480005879): 138996
Session 10 created (0.007301143000404409): 138996
Connection 10 created (0.07487037000009877): 153776
Session 27 created (0.008741670999370399): 153776
Connection 27 created (0.06051018699963606): 168292
Session 19 created (0.005984718000036082): 168292
Connection 19 created (0.0524720299999899): 182808
Session 7 created (0.008431165999354562): 182808
Connection 7 created (0.06080749300053867): 197324
Session 14 created (0.01610251599959156): 197324
Connection 14 created (0.07909605199984071): 211840
Session 5 created (0.007650550000107614): 211840
Connection 5 created (0.06343064499924367): 226356
Session 22 created (0.023409358999742835): 226356
Connection 22 created (0.06664130799981649): 240872
Session 4 created (0.006029619000400999): 240872
Connection 4 created (0.07310243400024774): 255388
Session 26 created (0.01154692599993723): 255872
Connection 26 created (0.07243352099976619): 270108
Session 25 created (0.006844534000265412): 270108
Connection 25 created (0.055366586999298306): 284868
Session 11 created (0.007529647999945155): 284868
Connection 11 created (0.06344944500051497): 299344
Session 18 created (0.005848013999639079): 299344
Connection 18 created (0.08036417699986487): 313860
Session 9 created (0.008084058999884292): 314112
Connection 9 created (0.06677820000004431): 328564
Session 2 created (0.006900932999997167): 328564
Connection 2 created (0.05366153300019505): 343040
Session 23 created (0.01589870600037102): 343280
Session 13 created (0.008930171999963932): 343500
Connection 23 created (0.5349255040000571): 343500
Connection 13 created (0.18205730699992273): 343500
Session 29 created (0.010847108999769262): 343500
Connection 29 created (0.057455307000054745): 344056
Session 12 created (0.00756914500016137): 344056
Connection 12 created (0.06418953599950328): 346908
Session 8 created (0.004972495999936655): 346908
Connection 8 created (0.06857022100030008): 351656
Session 0 created (0.008681567999701656): 351656
Connection 0 created (0.06100977499954752): 358780
Session 24 created (0.005650809000144363): 358780
Session 15 created (0.005491906000315794): 367224
Connection 24 created (0.11594003299978795): 381932
Connection 15 created (0.1103509260001374): 388004
Session 6 created (0.006968434000555135): 388004
Connection 6 created (0.06429383799968491): 402520
End: 402844
371 732 KB memory increase for 30 threads (12 391 KB per thread)
With cache
Init: 31088
Session 25 created (0.02544531300009112): 39592
Session 8 created (0.00867557499987015): 39592
Session 23 created (0.005164503999367298): 64660
Session 7 created (0.00504450199969142): 64660
Connection 25 created (0.5942038679995676): 64660
Connection 23 created (0.34247499799948855): 64856
Connection 7 created (0.3401085499999681): 65120
Connection 8 created (0.5922448290002649): 65836
Session 17 created (0.008868378999977722): 65836
Connection 17 created (0.005207504999816592): 66360
Session 26 created (0.008478669999931299): 66360
Connection 26 created (0.0056940149997899425): 66884
Session 10 created (0.007501451000280213): 66884
Connection 10 created (0.0050908019993585185): 67408
Session 4 created (0.025724717999764835): 67408
Connection 4 created (0.009540091999951983): 67932
Session 9 created (0.02167163600006461): 67932
Connection 9 created (0.04880168300041987): 68456
Session 29 created (0.007917160000033618): 68456
Connection 29 created (0.00423368500014476): 68980
Session 12 created (0.025886121000439744): 68980
Connection 12 created (0.011322327999550907): 69504
Session 19 created (0.008446269999694778): 69504
Connection 19 created (0.006079622000470408): 70028
Session 5 created (0.04109294099998806): 70028
Session 24 created (0.03339178400074161): 70028
Connection 5 created (0.010130708000360755): 70812
Connection 24 created (0.00694824200036237): 71076
Session 28 created (0.015083008999681624): 71076
Connection 28 created (0.010458413999913319): 71600
Session 14 created (0.022487161000753986): 71600
Connection 14 created (0.01104862600004708): 72124
Session 6 created (0.008993284000098356): 72124
Connection 6 created (0.008636976999696344): 72648
Session 21 created (0.02140433799922903): 73044
Connection 21 created (0.008903882000595331): 73044
Session 15 created (0.012625958000171522): 73044
Connection 15 created (0.005347808999431436): 73044
Session 16 created (0.010092006000377296): 73044
Connection 16 created (0.007994564000000537): 73044
Session 1 created (0.008494573999996646): 73044
Connection 1 created (0.005168704999960028): 73044
Session 0 created (0.014461996000136423): 73044
Connection 0 created (0.008165866999661375): 73044
Session 3 created (0.01992350799991982): 73044
Connection 3 created (0.008438672000011138): 73044
Session 27 created (0.008350270999471832): 73044
Connection 27 created (0.005324409000422747): 73044
Session 2 created (0.008396872000048461): 73044
Connection 2 created (0.0055756140000085): 73044
Session 20 created (0.007140045999221911): 73044
Connection 20 created (0.0044659919994956): 73044
Session 11 created (0.011294531000203278): 73044
Connection 11 created (0.006661236000581994): 73044
Session 13 created (0.022734365000360413): 73044
Connection 13 created (0.0069842430002609035): 73044
Session 18 created (0.007411251999656088): 73044
Connection 18 created (0.005829318999531097): 73044
Session 22 created (0.014411694999580504): 73044
Connection 22 created (0.006717636999383103): 73164
End: 73952
42 864 KB memory increase (1429 KB per thread). This is a 88% memory saving compared to non-cache version.
Note : we can see that the 4 first threads took longer than the others for creating their connection and that memory saving is not as much as on sequential test. This is because @functools.cache does not uses lock when creating and caching objects. For this reason, when two threads calls the create_loader at the exact same time, this can lead to creation of 2 distinct loaders before the last one is cached and then returned to subsequent threads. This is someting that could be improved by using a cache function with a lock that ensures we only create one single Loader, but I'm not sure it's worth it.
Would you mind consider this feature request and accept a PR ?
stumbled across this issue while investigating a memory leak and wondered why there are so many really big strings holding the same information loaded by botocore sessions
I've posted a reply to the mentioned tracking issue (https://github.com/boto/boto3/issues/1670#issuecomment-2960687749). I'm going to close this issue as a duplicate, and I encourage you to follow-up on this in the tracking issue.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
Thank you - I have replied there as requested, albeit I'm unconvinced.