boto3
boto3 copied to clipboard
short-lived boto3.Session using long-lived botocore.session.Session "leaks" memory
>>> import botocore, boto3
>>> boto3.__version__, botocore.__version__
('1.9.250', '1.12.250')
Every time a boto3.Session
is created with botocore_session
kwarg set, it creates handlers on the botocore session. When the boto3.Session
is deleted, those handlers stick around.
In a project where you cache botocore sessions for the lifetime of your credentials, but create a new boto3 session every time you want to use AWS, this can lead to some incredible memory usage that's hard to track down.
test case:
import boto3
import botocore
from botocore.hooks import NodeList, _PrefixTrie, HierarchicalEmitter
# With every boto3 session created and destroyed, the botocore session object grows,
# leading to extreme memory usage when boto3 session are short lived and botocore
# sessions are long-lived.
def leak_it():
session = boto3.Session(region_name='us-west-2', botocore_session=botocore_session)
botocore_session = botocore.session.Session()
for x in range(10000):
leak_it()
# print the things that grew unexpectedly
def dude_wheres_my_ram(obj, path):
new_iter = None
new_path = None
if isinstance(obj, dict):
new_iter = obj.keys()
new_path = path + "['{}']"
elif isinstance(obj, list):
new_iter = range(len(obj))
new_path = path + "[{}]"
elif isinstance(obj, NodeList):
new_iter = range(len(obj.middle))
new_path = path + ".middle[{}]"
elif isinstance(obj, _PrefixTrie):
new_iter = range(len(obj._root))
new_path = path + "._root['{}']"
elif isinstance(obj, HierarchicalEmitter):
return dude_wheres_my_ram(obj._handlers, path+'._handlers')
if not new_iter:
return
if len(new_iter) > 100:
print(f'{path}: {len(new_iter)}')
for key in new_iter:
if isinstance(obj, NodeList):
dude_wheres_my_ram(obj.middle[key], new_path.format(key))
elif isinstance(obj, _PrefixTrie):
dude_wheres_my_ram(obj._root[key], new_path.format(key))
else:
dude_wheres_my_ram(obj[key], new_path.format(key))
dude_wheres_my_ram(botocore_session._original_handler._handlers._root, 'botocore_session._original_handler._handlers._root')
Running the above uses 50MB of ram and prints:
$> python3 prove_boto3_doesnt_cleanup_after_itself.py
botocore_session._original_handler._handlers._root['children']['creating-client-class']['children']['s3']['values']: 10001
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['s3']['children']['Bucket']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['s3']['children']['Object']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['s3']['children']['ObjectSummary']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['ec2']['children']['ServiceResource']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['ec2']['children']['Instance']['values']: 10000
Expected behaviour: When boto3 session is deleted, it should remove its handlers from botocore session. Or boto3 session should reuse handlers on the botocore session if they already exist.
Thank you for your post. I am not able to reproduce the issue. I ran the below code which will create a boto3 session in a loop and plot the memory usage. I have attached the used memory graph. From the graph i can see that there is not significantly increasing memory with each session creation which is expected behavior.
import boto3
import botocore
import psutil
import matplotlib.pyplot as pp
botocore_session = botocore.session.Session()
used = []
for i in range(10000):
memory = psutil.virtual_memory()
used.append(memory.used)
session = boto3.Session(botocore_session=botocore_session)
pp.plot(used)
pp.show()
Can you please run this code and let me know if you are still seeing significant memory usage ?
It's really hard to check python memory usage like that because you're getting memory used by all processes on your computer. Here's a slightly modified script to get the actual memory used by your python process, with different output:
import boto3
import botocore
import resource
import matplotlib.pyplot as pp
botocore_session = botocore.session.Session()
used = []
for i in range(10000):
memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
memory_megabytes = memory/1024/1024
used.append(memory_megabytes)
session = boto3.Session(botocore_session=botocore_session)
pp.plot(used)
pp.show()
If you don't use botocore_session, the memory usage remains constant after some initial startup:
import botocore
import resource
import matplotlib.pyplot as pp
botocore_session = botocore.session.Session()
used = []
for i in range(10000):
memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
memory_megabytes = memory/1024/1024
used.append(memory_megabytes)
session = boto3.Session()
pp.plot(used)
pp.show()
Alternatively, I guess psutil.Process(name='python3').memory_info().rss
would work too.
For reference, here's what I got when I ran your script:
@longbowrocks - I am able to reproduce the issue with this script:
import os
import boto3
import botocore
import resource
import psutil
from resource import *
import matplotlib.pyplot as pp
import sys
botocore_session = botocore.session.Session()
used = []
for i in range(10000):
process = psutil.Process(os.getpid())
memory = process.memory_info().rss/1024/1024
used.append(memory)
session = boto3.Session(botocore_session = botocore_session)
pp.plot(used)
pp.show()
Marking this as a bug.
Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. Because it has been longer than one year since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment to prevent automatic closure, or if the issue is already closed, please feel free to reopen it.