`bucket.objects.filter` and `bucket.download_file` have a memory leak
Describe the bug
Running boto3.resource("s3").Bucket("some bucket").objects.filter() and boto3.resource("s3").Bucket("some bucket").download_file() increases memory footprint of python script, whether or not the file is deleted afterwards locally.
Steps to reproduce
import os
import boto3
import botocore
import resource
import psutil
from resource import *
import sys
import shutil
import gc
@profile
def main():
bucket = boto3.resource("s3").Bucket("some bucket")
for i in range(10):
objects = list(bucket.objects.filter(Prefix="000"))
s3_file = objects[0]
bucket.download_file(s3_file.key, "./test")
os.remove("./test")
process = psutil.Process(os.getpid())
memory = process.memory_info().rss / 1024 / 1024
print(memory)
main()
Expected behavior
After every download and remove, memory usage should be net 0.
Debug logs The above script gives me the following output:
38.4453125
38.8359375
38.953125
38.95703125
38.96484375
38.96875
38.96875
38.97265625
39.14453125
39.1484375
Filename: boto-test.py
Line # Mem usage Increment Occurences Line Contents
============================================================
12 28.227 MiB 28.227 MiB 1 @profile
13 def main():
14 35.348 MiB 7.121 MiB 1 bucket = boto3.resource("s3").Bucket("some bucket")
15
16 39.148 MiB 0.000 MiB 11 for i in range(10):
17 39.145 MiB 2.293 MiB 10 objects = list(bucket.objects.filter(Prefix="000"))
18 39.145 MiB 0.000 MiB 10 s3_file = objects[0]
19 39.148 MiB 1.508 MiB 10 bucket.download_file(s3_file.key, "./test")
20 39.148 MiB 0.000 MiB 10 os.remove("./test")
21 39.148 MiB 0.000 MiB 10 process = psutil.Process(os.getpid())
22 39.148 MiB 0.000 MiB 10 memory = process.memory_info().rss / 1024 / 1024
23 39.148 MiB 0.000 MiB 10 print(memory)
Hi @SamuelNorbury,
Thanks for the report— I was able to reproduce. Marking this as a bug for now.
Any update on this?
Hi, thanks for your patience on this. You're seeing this behavior because that list operation involves creating a deepcopy of the list of objects, and those operations are in a loop. Each iteration, the object variable is reassigned to a copy of the same data, and Python doesn't free the previously used memory immediately. Download and removing the file isn't relevant. Moving the list operation outside of the loop avoids this issue. Please let me know if you have any follow-up questions.
Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.