boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Directory upload/download with boto3

Open dduleep opened this issue 10 years ago • 60 comments

In the PHP sdk has some function for download and upload as directory(http://docs.aws.amazon.com/aws-sdk-php/v2/guide/service-s3.html#uploading-a-directory-to-a-bucket) Is there any similar function available with boto3?

if there is not such function, what kind of method/s most sufficient for download/upload directory

note My ultimate target is create sync function like aws cli

now i'm using download/upload files using https://boto3.readthedocs.org/en/latest/reference/customizations/s3.html?highlight=upload_file#module-boto3.s3.transfer

dduleep avatar Nov 11 '15 10:11 dduleep

Sorry there is no directory upload/download facility in Boto 3 at this moment. We are considering to backport those CLI sync functions to Boto 3, but there is no specific plan yet.

rayluo avatar Nov 11 '15 17:11 rayluo

+1 for a port of the CLI sync function

JamieCressey avatar Feb 16 '16 21:02 JamieCressey

this would be really useful, imho sync is one of the more popular CLI functions.

BeardedSteve avatar Mar 02 '16 14:03 BeardedSteve

+1 this would save me a bunch of time

ernestm avatar Mar 08 '16 15:03 ernestm

+1 "aws s3 sync SRC s3://BUCKET_NAME/DIR[/DIR....] " Porting this cli to boto3 would be so helpful.

litdream avatar Mar 30 '16 19:03 litdream

+1

KBoehme avatar Apr 21 '16 15:04 KBoehme

+1

astewart-twist avatar May 19 '16 06:05 astewart-twist

+1

aaroncutchin avatar Jun 21 '16 04:06 aaroncutchin

+1

hikch avatar Jul 07 '16 16:07 hikch

+1

pd3244 avatar Jul 11 '16 17:07 pd3244

+1

MourIdri avatar Jul 12 '16 15:07 MourIdri

+1

ghost avatar Jul 22 '16 18:07 ghost

+1

gonwi avatar Aug 02 '16 10:08 gonwi

+1

zaforic avatar Aug 19 '16 18:08 zaforic

+1

rdickey avatar Aug 19 '16 19:08 rdickey

I've been thinking a bit about that, it seems that we have a proof of concept working here: https://github.com/seedifferently/boto_rsync

However the project didn't seems to have any love for a while. Instead of forking it, I was asking myself what it would take to rewrite it as part as a Boto3 feature.

Can I start with just a sync between local system and a boto3 client?

Does AWS provide a CRC-32 check or something that I could use to detect if a file needs to be re-uploaded? Should I base this on the file length instead?

Natim avatar Aug 24 '16 08:08 Natim

Right now the simple way I used, is:

def sync_to_s3(target_dir, aws_region=AWS_REGION, bucket_name=BUCKET_NAME):
    if not os.path.isdir(target_dir):
        raise ValueError('target_dir %r not found.' % target_dir)

    s3 = boto3.resource('s3', region_name=aws_region)
    try:
        s3.create_bucket(Bucket=bucket_name,
                         CreateBucketConfiguration={'LocationConstraint': aws_region})
    except ClientError:
        pass

    for filename in os.listdir(target_dir):
        logger.warn('Uploading %s to Amazon S3 bucket %s' % (filename, bucket_name))
        s3.Object(bucket_name, filename).put(Body=open(os.path.join(target_dir, filename), 'rb'))

        logger.info('File uploaded to https://s3.%s.amazonaws.com/%s/%s' % (
            aws_region, bucket_name, filename))

It just upload the new version of every files but it doesn't remove previous ones nor check if the file changed in between.

Natim avatar Aug 24 '16 09:08 Natim

+1

mikaelho avatar Nov 02 '16 06:11 mikaelho

+1

danielwhatmuff avatar Nov 22 '16 05:11 danielwhatmuff

+1

klj613 avatar Nov 23 '16 14:11 klj613

+1

Cenhinen avatar Nov 23 '16 15:11 Cenhinen

I guess you can add as many +1 as you want but what would be more useful would be to start a pull-request on the project. Nobody is going to do it for you folks.

Natim avatar Nov 23 '16 15:11 Natim

Natim, you got to be kidding. Implementing this in a reliable way is not trivial, and they already got it implemented - in python - in the AWS CLI. It is just implemented in such a convoluted way that you need to be a AWS CLI expert to pull it out.

mikaelho avatar Nov 23 '16 16:11 mikaelho

Implementing this in a reliable way is not trivial

I didn't say it was trivial but it doesn't have to be perfect at first and we can iterate on it, I already wrote something working in 15 lines of code we can start from there.

I don't think reading the AWS CLI tool will help to implement it in boto3.

Natim avatar Nov 23 '16 16:11 Natim

What I really need is simpler than a directory sync. I just want to pass multiple files to boto3 and have it handle the upload of those, taking care of multithreading etc.

I guess this could be done with a light wrapper around existing API, but I'd have to spend some time on investigating it. Does anyone have some hints or a rough idea of how to set it up? I'd be willing to do a PR for this once I find the time.

cpury avatar Apr 03 '17 09:04 cpury

Awscli's sync function is really fast, so my current code uses subprocess to make a call to it. Having it backported to boto would be so much cleaner though. Another +1 for that to happen.

sweenu avatar Aug 23 '17 10:08 sweenu

+1

blrnw3 avatar Oct 05 '17 21:10 blrnw3

I was successfully using s4cmd for a while to do this on relatively large directories, but started running into sporadic failures where it wouldn't quite get everything copied. Might be worth taking a peek at what they did there to see if some of it can be salvaged/reused. https://github.com/bloomreach/s4cmd

jimmywan avatar Oct 10 '17 17:10 jimmywan

+1

davidfischer-ch avatar Oct 24 '17 19:10 davidfischer-ch

I used this method (altered from Natims code):

def upload_directory(src_dir, bucket_name, dst_dir):
    if not os.path.isdir(src_dir):
        raise ValueError('src_dir %r not found.' % src_dir)
    all_files = []

    for root, dirs, files in os.walk(src_dir):
        all_files += [os.path.join(root, f) for f in files]
    s3_resource = boto3.resource('s3')

    for filename in all_files:
        s3_resource.Object(bucket_name, os.path.join(dst_dir, os.path.relpath(filename, src_dir)))\
            .put(Body=open(filename, 'rb'))

The main differences (other then logging and different checks) is that this method copies all files in the directory recursively, and that it allows changing the root path in s3 (inside the bucket).

yaniv-g avatar Nov 21 '17 17:11 yaniv-g