aws-sdk-rust icon indicating copy to clipboard operation
aws-sdk-rust copied to clipboard

[request]: S3 get folder (directory) size

Open ZelCloud opened this issue 3 years ago • 1 comments
trafficstars

A note for the community

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue, please leave a comment

Tell us about your request

I'd like for the s3 package to also include a way to get the total size for a folder (directory).

Something similar to

aws s3 ls --summarize --human-readable --recursive s3://bucket/folder

or from boto3

import boto3

def get_folder_size(bucket, prefix):
    total_size = 0
    for obj in boto3.resource('s3').Bucket(bucket).objects.filter(Prefix=prefix):
        total_size += obj.size
    return total_size

ex. bucket_name/folder - 4Gb (total size including subdirectories)

Tell us about the problem you're trying to solve.

We have limits on how much data can be uploaded to a folder so knowing the total size and by extension being able to show it to the user is important. Another issue is if the size of the folder is really high (ex. 100Gb) we'd like to prevent the user from downloading everything in one shot vs a folder of 30mb and a few dozen files.

Are you currently working around this issue?

Still in the process of migrating some services to rust and by extension aws-sdk-rust, so not working around it but it is blocking us from moving over fully.

Additional context

No response

ZelCloud avatar Feb 21 '22 05:02 ZelCloud

Hello and thank for the feature request! The S3 client only contains generated code from models, but this would certainly be good functionality for a high-level library for S3. I'd suggest porting the boto code to Rust for the time being via list_objects_v2: https://docs.rs/aws-sdk-s3/0.6.0/aws_sdk_s3/client/struct.Client.html#method.list_objects_v2

Be sure that you use into_paginator() to ensure you read all pages of results.

use std::error::Error;
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let shared_config = aws_config::load_from_env().await;
    let s3 = aws_sdk_s3::Client::new(&shared_config);
    let prefix = "prefix";
    let bucket = "bucket";
    let mut pages = s3
        .list_objects_v2()
        .bucket(bucket)
        .prefix(prefix)
        .into_paginator()
        .send();
    let mut total: i64 = 0;
    while let Some(page) = pages.next().await {
        total += page?
            .contents()
            .unwrap_or_default()
            .iter()
            .map(|obj| obj.size)
            .sum::<i64>();
    }
    println!("total size: {}", total);
    Ok(())
}

rcoh avatar Feb 21 '22 13:02 rcoh

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Mar 01 '24 18:03 github-actions[bot]