`aws s3 rm` should use batch deletes
When running the command aws s3 rm --recursive s3://bucketname/path/, I expect it to use batch object deletion to delete the files quickly with the fewest requests. It appears to be deleting files one at a time.
$ aws --version
aws-cli/1.11.13 Python/3.5.2 Linux/4.4.0-1041-aws botocore/1.4.70
To test: Create one bucket with several files. Then sync that bucket to another bucket (this is the bucket we will delete everything from)
aws s3 sync s3://source-bucket/ s3://bucket-to-delete/
Then delete the contents of bucket-to-delete:
aws s3 rm --recursive s3://bucket-to-delete/
Notice that it lists each file to delete sequentially.
Re-sync from source-bucket to bucket-to-delete and re-delete things with --debug to get lots of detail. Save detail to /tmp/out
aws s3 rm --debug --recursive s3://bucket-to-delete/ 2>&1 |tee /tmp/out
Then inspect /tmp/out for HTTP requests to confirm the DELETE method was called multiple times, one for each object:
$ grep HTTP /tmp/out
2018-02-22 20:22:51,694 - MainThread - botocore.auth - DEBUG - HTTP request method: GET
2018-02-22 20:22:52,065 - Thread-4 - botocore.auth - DEBUG - HTTP request method: DELETE
2018-02-22 20:22:52,227 - Thread-6 - botocore.auth - DEBUG - HTTP request method: DELETE
2018-02-22 20:22:52,248 - Thread-8 - botocore.auth - DEBUG - HTTP request method: DELETE
2018-02-22 20:22:52,329 - Thread-9 - botocore.auth - DEBUG - HTTP request method: DELETE
2018-02-22 20:22:52,390 - Thread-4 - botocore.auth - DEBUG - HTTP request method: DELETE
2018-02-22 20:22:52,410 - Thread-11 - botocore.auth - DEBUG - HTTP request method: DELETE
2018-02-22 20:22:52,553 - Thread-4 - botocore.auth - DEBUG - HTTP request method: DELETE
Deleting portions of large buckets could be done much faster, with much less network overhead if the CLI tool would use batch deletes according to https://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html
Yeah that would definitely be faster. Marking as an enhancement. Thanks for bringing it up!
Confirmed this is still the case with awscli v1.14.45 (previous output was from 1.11.13)
~/.local/bin/aws --version
aws-cli/1.14.45 Python/2.7.12 Linux/4.4.0-1041-aws botocore/1.8.49
grep HTTP /tmp/out.new
2018-02-23 01:41:55,982 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTPS connection (1): s3.amazonaws.com
2018-02-23 01:41:56,155 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "GET /deleteme-20180222a?prefix=&encoding-type=url HTTP/1.1" 200 None
2018-02-23 01:41:56,243 - ThreadPoolExecutor-0_0 - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTPS connection (1): s3.amazonaws.com
2018-02-23 01:41:56,332 - ThreadPoolExecutor-0_1 - botocore.vendored.requests.packages.urllib3.connectionpool - INFO - Starting new HTTPS connection (2): s3.amazonaws.com
2018-02-23 01:41:56,333 - ThreadPoolExecutor-0_0 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/1.pdf HTTP/1.1" 204 0
2018-02-23 01:41:56,497 - ThreadPoolExecutor-0_2 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/4.pdf HTTP/1.1" 204 0
2018-02-23 01:41:56,517 - ThreadPoolExecutor-0_0 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/7.pdf HTTP/1.1" 204 0
2018-02-23 01:41:56,557 - ThreadPoolExecutor-0_1 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/2.pdf HTTP/1.1" 204 0
2018-02-23 01:41:56,559 - ThreadPoolExecutor-0_3 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/5.pdf HTTP/1.1" 204 0
2018-02-23 01:41:56,618 - ThreadPoolExecutor-0_4 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/6.pdf HTTP/1.1" 204 0
2018-02-23 01:41:56,639 - ThreadPoolExecutor-0_2 - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "DELETE /deleteme-20180222a/3.pdf HTTP/1.1" 204 0
Is there any update on this? We suffer every time we have to delete a bucket with millions of files.
Is there any update on this? We suffer every time we have to delete a bucket with millions of files.
still suffering.
+1
I also had to use 'nice' and 'cpulimit' to prevent the EC2 I was running it in from overloading.
sudo apt install cpulimit
/usr/bin/cpulimit -q -b -c 1 -e aws -l 30
nice aws s3 rm --quiet s3://my.bucket.name --recursive
I had to resort to using the cli because the web browser tab crashed while I was attempting the same 'Empty Bucket' command overnight :(
Does anyone have a better way to empty a bucket?
Paully
+1
I also had to use 'nice' and 'cpulimit' to prevent the EC2 I was running it in from overloading.
sudo apt install cpulimit /usr/bin/cpulimit -q -b -c 1 -e aws -l 30 nice aws s3 rm --quiet s3://my.bucket.name --recursiveI had to resort to using the cli because the web browser tab crashed while I was attempting the same 'Empty Bucket' command overnight :(
Does anyone have a better way to empty a bucket?
Paully
best way so far for me has been a glue job using purge https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-glue-context.html#aws-glue-api-crawler-pyspark-extensions-glue-context-purge_s3_path
I have solved it by creating a Lifecycle Configuration in the bucket to delete all objects and markers after 1 day.
+1
I also had to use 'nice' and 'cpulimit' to prevent the EC2 I was running it in from overloading.
sudo apt install cpulimit /usr/bin/cpulimit -q -b -c 1 -e aws -l 30 nice aws s3 rm --quiet s3://my.bucket.name --recursiveI had to resort to using the cli because the web browser tab crashed while I was attempting the same 'Empty Bucket' command overnight :(
Does anyone have a better way to empty a bucket?
Paully
I typically use below bash script
function s3_batch_delete(){
# This function deletes files from an S3 bucket based on a specified prefix.
# Arguments:
# $1: The bucket name from which files will be deleted.
# $2: The prefix for the files to be deleted. The prefix should not start with a '/'.
# $3: The AWS profile to use for authentication. The profile is stored in the ~/.aws/credentials file.
# List the keys we want to delete and save them in the keysToDelete.txt file.
aws s3api list-objects-v2 --output text --bucket "${1}" --prefix "${2}" --query 'Contents[].[Key]' --profile "${3}" > keysToDelete.txt
# Delete the keys in keysToDelete.txt in batches of 1000.
# We use -P$(nproc) to run multiple batches in parallel over all the available logical cores.
# To handle longer paths, we adjust the --max-chars setting of xargs to 90% of the maximum argument size allowed by the platform.
max_arg=$(echo $(getconf ARG_MAX)*0.90/1 | bc)
cat keysToDelete.txt | xargs -P$(nproc) -n1000 --max-chars="$max_arg" bash -c 'aws s3api delete-objects --bucket '"${1}"' --profile '"${3}"' --delete "Objects=[$(printf "{Key=%q}," "$@")]" >> deletedKeysAndVersionOfDeleteMarker.txt' _
cat deletedKeysAndVersionOfDeleteMarker.txt && rm deletedKeysAndVersionOfDeleteMarker.txt
# Remove the file with keys after the files have been wiped from S3.
rm keysToDelete.txt
}
It can be used in below fashion and issues multiple batch deletes, each deleting a 1000 files in one API call, in parallel over all the available logical cores.
s3_batch_delete bucketName prefix profileName