boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

short-lived boto3.Session using long-lived botocore.session.Session "leaks" memory

Open longbowrocks opened this issue 5 years ago • 6 comments

>>> import botocore, boto3
>>> boto3.__version__, botocore.__version__
('1.9.250', '1.12.250')

Every time a boto3.Session is created with botocore_session kwarg set, it creates handlers on the botocore session. When the boto3.Session is deleted, those handlers stick around.

In a project where you cache botocore sessions for the lifetime of your credentials, but create a new boto3 session every time you want to use AWS, this can lead to some incredible memory usage that's hard to track down.

test case:

import boto3
import botocore
from botocore.hooks import NodeList, _PrefixTrie, HierarchicalEmitter

# With every boto3 session created and destroyed, the botocore session object grows, 
# leading to extreme memory usage when boto3 session are short lived and botocore 
# sessions are long-lived.
def leak_it():
  session = boto3.Session(region_name='us-west-2', botocore_session=botocore_session)

botocore_session = botocore.session.Session()
for x in range(10000):
  leak_it()

# print the things that grew unexpectedly
def dude_wheres_my_ram(obj, path):
  new_iter = None
  new_path = None
  if isinstance(obj, dict):
    new_iter = obj.keys()
    new_path = path + "['{}']"
  elif isinstance(obj, list):
    new_iter = range(len(obj))
    new_path = path + "[{}]"
  elif isinstance(obj, NodeList):
    new_iter = range(len(obj.middle))
    new_path = path + ".middle[{}]"
  elif isinstance(obj, _PrefixTrie):
    new_iter = range(len(obj._root))
    new_path = path + "._root['{}']"
  elif isinstance(obj, HierarchicalEmitter):
    return dude_wheres_my_ram(obj._handlers, path+'._handlers')
  if not new_iter:
    return
  if len(new_iter) > 100:
    print(f'{path}: {len(new_iter)}')
  for key in new_iter:
    if isinstance(obj, NodeList):
      dude_wheres_my_ram(obj.middle[key], new_path.format(key))
    elif isinstance(obj, _PrefixTrie):
      dude_wheres_my_ram(obj._root[key], new_path.format(key))
    else:
      dude_wheres_my_ram(obj[key], new_path.format(key))

dude_wheres_my_ram(botocore_session._original_handler._handlers._root, 'botocore_session._original_handler._handlers._root')

Running the above uses 50MB of ram and prints:

$> python3 prove_boto3_doesnt_cleanup_after_itself.py
botocore_session._original_handler._handlers._root['children']['creating-client-class']['children']['s3']['values']: 10001
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['s3']['children']['Bucket']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['s3']['children']['Object']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['s3']['children']['ObjectSummary']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['ec2']['children']['ServiceResource']['values']: 10000
botocore_session._original_handler._handlers._root['children']['creating-resource-class']['children']['ec2']['children']['Instance']['values']: 10000

Expected behaviour: When boto3 session is deleted, it should remove its handlers from botocore session. Or boto3 session should reuse handlers on the botocore session if they already exist.

longbowrocks avatar Oct 24 '19 14:10 longbowrocks

Thank you for your post. I am not able to reproduce the issue. I ran the below code which will create a boto3 session in a loop and plot the memory usage. I have attached the used memory graph. From the graph i can see that there is not significantly increasing memory with each session creation which is expected behavior.

import boto3
import botocore
import psutil
import matplotlib.pyplot as pp

botocore_session = botocore.session.Session()
used = []
for i in range(10000):
        memory = psutil.virtual_memory()
        used.append(memory.used)
        session = boto3.Session(botocore_session=botocore_session)


pp.plot(used)
pp.show()

used_memory

Can you please run this code and let me know if you are still seeing significant memory usage ?

swetashre avatar Oct 29 '19 17:10 swetashre

It's really hard to check python memory usage like that because you're getting memory used by all processes on your computer. Here's a slightly modified script to get the actual memory used by your python process, with different output:

import boto3
import botocore
import resource
import matplotlib.pyplot as pp

botocore_session = botocore.session.Session()
used = []
for i in range(10000):
        memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        memory_megabytes = memory/1024/1024
        used.append(memory_megabytes)
        session = boto3.Session(botocore_session=botocore_session)


pp.plot(used)
pp.show()

image

If you don't use botocore_session, the memory usage remains constant after some initial startup:

import botocore
import resource
import matplotlib.pyplot as pp

botocore_session = botocore.session.Session()
used = []
for i in range(10000):
        memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        memory_megabytes = memory/1024/1024
        used.append(memory_megabytes)
        session = boto3.Session()


pp.plot(used)
pp.show()

image

longbowrocks avatar Oct 29 '19 19:10 longbowrocks

Alternatively, I guess psutil.Process(name='python3').memory_info().rss would work too.

longbowrocks avatar Oct 29 '19 19:10 longbowrocks

For reference, here's what I got when I ran your script: image

longbowrocks avatar Oct 31 '19 22:10 longbowrocks

@longbowrocks - I am able to reproduce the issue with this script:

import os
import boto3
import botocore
import resource
import psutil
from resource import *
import matplotlib.pyplot as pp
import sys

botocore_session = botocore.session.Session()
used = []

for i in range(10000):
        process = psutil.Process(os.getpid())
        memory = process.memory_info().rss/1024/1024
        used.append(memory)
        session = boto3.Session(botocore_session = botocore_session)

pp.plot(used)
pp.show()

Marking this as a bug.

swetashre avatar Nov 04 '19 17:11 swetashre

Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. Because it has been longer than one year since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment to prevent automatic closure, or if the issue is already closed, please feel free to reopen it.

github-actions[bot] avatar Nov 03 '20 18:11 github-actions[bot]