boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Document difference between S3 object `copy` vs `copy_from` vs `copy_object`

Open mdavis-xyz opened this issue 3 years ago • 9 comments

s3.Object has methods copy and copy_from.

Based on the name, I assumed that copy_from would copy from some other key into the key (and bucket) of this s3.Object. Therefore I assume that the other copy function would to the opposite. i.e. copy from this s3.Object to another object. Or maybe the two are the other way around.

But after reading the docs for both, it looks like they both do the same thing. They both copy from another object into this object. Is that correct? What's the point of having two functions that copy in the same direction?

What I want is to copy the existing s3.Object into a different path. I don't want to have to manually instantiate a second s3.Object instance in python, and then pass the bucket and key manually from the first.

i.e. what's the easiest way to copy s3://bucketA/pathA.txt to s3://bucketB/pathB.txt, if I already have s3.Object('bucketA','pathA.txt')?

mdavis-xyz avatar Oct 21 '21 06:10 mdavis-xyz

Hi @mdavis-xyz,

That's a good point— Both copy and copy_from seem to use CopyObject under the hood. I'm not seeing any discernible differences aside from the fact that they accept arguments in different formats— I'll double-check with the team to clarify.

The easiest copy for s3://bucketA/pathA.txt to s3://bucketB/pathB.txt would be to access the meta client and use the s3Transfer copy method:

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('sourcebucketname')
obj = bucket.Object('sourceobject')

s3.meta.client.copy({"Bucket":bucket.name, "Key":obj.key}, 'destinationbucket', 'key')

Hope this helps!

stobrien89 avatar Oct 21 '21 23:10 stobrien89

Can we add a new copy method to s3.Object? One that copies from this object to another?

It seems silly to bother having high-level resources, but then to copy you have to extract the low level client from the service resource or object resource, and then extract not one but two identifiers from the high level resource to pass to the low level call, in a way that is inconsistent with the way that the destination object is passed to the call. This is quite clunky and verbose.

We should be able to do:

obj.copy_to(destinationKey=key, destinationBucket=bucket_name)

but default the destination bucket to the source bucket if omitted:

obj.copy_to(destinationKey=key)

And also:

bucket.copy(sourceKey=key1, destinationKey=key2) # copy within bucket

mdavis-xyz avatar Oct 21 '21 23:10 mdavis-xyz

The documentation for the low level copy is also a bit confusing.

The S3 client has copy and copy_object. What's the difference?

And why do they use:

s3 = boto3.resource('s3')
s3.meta.client.copy(...)

Instead of

boto3.client('s3').copy()

?

mdavis-xyz avatar Oct 21 '21 23:10 mdavis-xyz

Hi @mdavis-xyz,

I was able to confirm with the team that the resource .copy resource action is basically just the s3 transfer copy method I mentioned to you in my last comment, but the action is also somewhat verbose and clunky to use because the resource you perform the action on is actually ported in as the destination for the copy. I don't think we'd add another copy method, but I definitely think we could improve the way the existing copy action is used.

For the low-level copy itself, it's

a managed transfer which will perform a multipart copy in multiple threads if necessary.

and customization over s3Transfer, which is why you need to access the meta client to use it. copy_object is the official s3 API operation, which isn't the most intuitive to use— the s3 transfer methods (and similarly sync, cp, etc. in the CLI) are there to make usage of some of the s3 APIs a bit easier.

stobrien89 avatar Oct 26 '21 23:10 stobrien89

Hmm, I'm still not understanding the difference.

How is boto3.resource('s3').meta.client different to boto3.client('s3')? Aren't they identical?

Are you saying that the difference is that copy does multi-threaded multi-part copy if necessary, and copy_from does a single-threaded single-part copy?

mdavis-xyz avatar Oct 27 '21 01:10 mdavis-xyz

Hi @mdavis-xyz,

I thought initially this was a special case where the meta client was needed and that's why it was documented, but that doesn't appear to be the case— seems to work fine on a standard client as well. And yes, the meta client is just a way to access a service's client from a resource instantiation.

Correct, copy_from is basically S3's copy_object, which is single-threaded and copy is the multi-threaded, multi-part copy from s3Transfer.

stobrien89 avatar Nov 02 '21 23:11 stobrien89

I can't seem to get either to preserve my metadata. (Specifically, LastModified.)

odigity avatar Feb 13 '23 23:02 odigity

I too was confused between copy() and copy_object(). I thought the obvious one to use was copy() and started using that one, just to realize that copy_object() is much faster, at least for small files (the situation I tested). In my experience copy_object() is 50% faster for a single process - if you use multiple parallel processes the effect is larger.

ghomem avatar Nov 06 '23 01:11 ghomem

which one support copy 'LastModified'

zhiweio avatar Nov 15 '23 14:11 zhiweio