aws-sdk-rust icon indicating copy to clipboard operation
aws-sdk-rust copied to clipboard

S3 - Delete all files under a folder (recursive delete)

Open ZelCloud opened this issue 3 years ago • 4 comments

Describe the feature

Delete all files underneath a folder (recursive delete) using a prefix.

Use Case

It would be nice to be able to give the prefix and delete under it versus trying to iterate through all the objects and deleting them individually.

ex. Bucket structure

  • folder1
    • file 1
    • file 2
    • file 3
  • folder2
    • file 4
  • folder3

Delete everything under "folder1/"

Proposed Solution

Ideally it would be nice if delete_objects took the prefix builder function argument.

ex.

// Deletes all files in folder1
let bucket = "my-bucket";
let prefix = "folder1/";
let s3res = s3.delete_objects()
        .bucket(bucket)
        .prefix(prefix)
        .send();

A possible workaround for right now might be, iterating through all the objects then building the delete_objects vec using the iterated page keys. Though I'm not sure if there's any gotchas or issues with this approach.

use std::error::Error;
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let shared_config = aws_config::load_from_env().await;
    let s3 = aws_sdk_s3::Client::new(&shared_config);
    let prefix = "prefix";
    let bucket = "bucket";
    let mut pages = s3.list_objects_v2()
                      .bucket(bucket)
                      .prefix(prefix)
                      .into_paginator()
                      .send();

    let mut delete_objects: Vec<ObjectIdentifier> = vec![];
    while let Some(page) = pages.next().await {
        let obj_id = ObjectIdentifier::builder().set_key(Some(page?.key)).build();
        delete_objects.push(obj_id);
    }
    
    let delete = Delete::builder().set_objects(Some(delete_objects)).build();
    
    s3.delete_objects()
      .bucket(bucket)
      .delete(delete)
      .send()
      .await?;

    println!("Objects deleted.");
    
    Ok(())
}

Other Information

Possible temporary workaround provided in proposed solution.

Acknowledgements

  • [ ] I may be able to implement this feature request
  • [ ] This feature might incur a breaking change

A note for the community

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue, please leave a comment

ZelCloud avatar May 04 '22 04:05 ZelCloud

makes sense! This isn't something that will be included in the SDK, but is a good candidate for a high level S3 library based on the AWS SDK

rcoh avatar May 04 '22 15:05 rcoh

This could also be a helpful example

Velfi avatar May 04 '22 15:05 Velfi

Though I'm not sure if there's any gotchas or issues with this approach.

With a sufficiently large list of objects you may run out of RAM with this approach as you're building up a large Vec of all of the objects. The basic approach itself is fine, and this issue could be avoided with a little tweaking. Perhaps by taking the paginated objects a few thousand at a time and deleting those, ensuring you never have to deal with a huge Vec.

phyber avatar May 04 '22 22:05 phyber

I think you want to do something like:

  • list objects/versions for your target prefix with pagination
  • take paginated results in chunks of 1000
  • invoke the DeleteObjects API for each chunk

Based on the API examples, it looks like you can delete objects while holding a pagination cursor. This method avoids building a big vector when operating on large buckets.

benmanns avatar May 13 '22 16:05 benmanns