boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

`bucket.objects.filter` and `bucket.download_file` have a memory leak

Open SamuelNorbury opened this issue 4 years ago • 2 comments

Describe the bug Running boto3.resource("s3").Bucket("some bucket").objects.filter() and boto3.resource("s3").Bucket("some bucket").download_file() increases memory footprint of python script, whether or not the file is deleted afterwards locally.

Steps to reproduce

import os
import boto3
import botocore
import resource
import psutil
from resource import *
import sys
import shutil
import gc

@profile
def main():
    bucket = boto3.resource("s3").Bucket("some bucket")

    for i in range(10):
        objects = list(bucket.objects.filter(Prefix="000"))
        s3_file = objects[0]
        bucket.download_file(s3_file.key, "./test")
        os.remove("./test")
        process = psutil.Process(os.getpid())
        memory = process.memory_info().rss / 1024 / 1024
        print(memory)

main()

Expected behavior After every download and remove, memory usage should be net 0.

Debug logs The above script gives me the following output:

38.4453125
38.8359375
38.953125
38.95703125
38.96484375
38.96875
38.96875
38.97265625
39.14453125
39.1484375
Filename: boto-test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    12   28.227 MiB   28.227 MiB           1   @profile
    13                                         def main():
    14   35.348 MiB    7.121 MiB           1       bucket = boto3.resource("s3").Bucket("some bucket")
    15                                         
    16   39.148 MiB    0.000 MiB          11       for i in range(10):
    17   39.145 MiB    2.293 MiB          10           objects = list(bucket.objects.filter(Prefix="000"))
    18   39.145 MiB    0.000 MiB          10           s3_file = objects[0]
    19   39.148 MiB    1.508 MiB          10           bucket.download_file(s3_file.key, "./test")
    20   39.148 MiB    0.000 MiB          10           os.remove("./test")
    21   39.148 MiB    0.000 MiB          10           process = psutil.Process(os.getpid())
    22   39.148 MiB    0.000 MiB          10           memory = process.memory_info().rss / 1024 / 1024
    23   39.148 MiB    0.000 MiB          10           print(memory)

SamuelNorbury avatar Apr 30 '21 12:04 SamuelNorbury

Hi @SamuelNorbury,

Thanks for the report— I was able to reproduce. Marking this as a bug for now.

stobrien89 avatar May 03 '21 16:05 stobrien89

Any update on this?

methsi avatar Oct 25 '23 08:10 methsi

Hi, thanks for your patience on this. You're seeing this behavior because that list operation involves creating a deepcopy of the list of objects, and those operations are in a loop. Each iteration, the object variable is reassigned to a copy of the same data, and Python doesn't free the previously used memory immediately. Download and removing the file isn't relevant. Moving the list operation outside of the loop avoids this issue. Please let me know if you have any follow-up questions.

RyanFitzSimmonsAK avatar Jun 06 '24 21:06 RyanFitzSimmonsAK

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Jun 17 '24 00:06 github-actions[bot]