`toil clean` on an AWS bucket can run for hours
a toil clean step in one of my scripts, pointed at an AWS job store, ran for over an hour (before I broke it by trying to run another simultaneous clean). I assumed that it would take a few seconds at most, because all it should need to do is tell Amazon to destroy an s3 bucket.
Not so; it seems that Toil's _delete_bucket involves some non-trivial code, which looks to involve a round trip for every single thing in the bucket:
https://github.com/DataBiosphere/toil/blob/f658acce644998ba3a565150b44e762a372d2e3b/src/toil/jobStores/aws/jobStore.py#L1284-L1297
If this can't be done as a single operation (due to the absence of https://github.com/boto/boto3/issues/1189), it should be done in many fewer round trips, by requesting the deletion of multiple things at once, and/or in parallel, with multiple simultaneous deletes in flight.
┆Issue is synchronized with this Jira Story ┆friendlyId: TOIL-275
boto apparently allows for deletion requests of up to 1000 keys at a time, which stackoverflow reports as an order of magnitude faster in some cases: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_objects
https://stackoverflow.com/questions/11426560/amazon-s3-boto-how-to-delete-folder
s3 = boto3.resource('s3')
objects_to_delete = s3.meta.client.list_objects(Bucket="MyBucket", Prefix="myfolder/test/")
delete_keys = {'Objects' : []}
delete_keys['Objects'] = [{'Key' : k} for k in [obj['Key'] for obj in objects_to_delete.get('Contents', [])]]
s3.meta.client.delete_objects(Bucket="MyBucket", Delete=delete_keys)
Will get to this eventually.