boto3
boto3 copied to clipboard
Document difference between S3 object `copy` vs `copy_from` vs `copy_object`
s3.Object
has methods copy
and copy_from
.
Based on the name, I assumed that copy_from
would copy from some other key into the key (and bucket) of this s3.Object
. Therefore I assume that the other copy function would to the opposite. i.e. copy from this s3.Object
to another object. Or maybe the two are the other way around.
But after reading the docs for both, it looks like they both do the same thing. They both copy from another object into this object. Is that correct? What's the point of having two functions that copy in the same direction?
What I want is to copy the existing s3.Object
into a different path. I don't want to have to manually instantiate a second s3.Object
instance in python, and then pass the bucket and key manually from the first.
i.e. what's the easiest way to copy s3://bucketA/pathA.txt
to s3://bucketB/pathB.txt
, if I already have s3.Object('bucketA','pathA.txt')
?
Hi @mdavis-xyz,
That's a good point— Both copy
and copy_from
seem to use CopyObject
under the hood. I'm not seeing any discernible differences aside from the fact that they accept arguments in different formats— I'll double-check with the team to clarify.
The easiest copy for s3://bucketA/pathA.txt
to s3://bucketB/pathB.txt
would be to access the meta client and use the s3Transfer copy method:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('sourcebucketname')
obj = bucket.Object('sourceobject')
s3.meta.client.copy({"Bucket":bucket.name, "Key":obj.key}, 'destinationbucket', 'key')
Hope this helps!
Can we add a new copy method to s3.Object? One that copies from this object to another?
It seems silly to bother having high-level resources, but then to copy you have to extract the low level client from the service resource or object resource, and then extract not one but two identifiers from the high level resource to pass to the low level call, in a way that is inconsistent with the way that the destination object is passed to the call. This is quite clunky and verbose.
We should be able to do:
obj.copy_to(destinationKey=key, destinationBucket=bucket_name)
but default the destination bucket to the source bucket if omitted:
obj.copy_to(destinationKey=key)
And also:
bucket.copy(sourceKey=key1, destinationKey=key2) # copy within bucket
The documentation for the low level copy is also a bit confusing.
The S3 client has copy
and copy_object
. What's the difference?
And why do they use:
s3 = boto3.resource('s3')
s3.meta.client.copy(...)
Instead of
boto3.client('s3').copy()
?
Hi @mdavis-xyz,
I was able to confirm with the team that the resource .copy
resource action is basically just the s3 transfer copy method I mentioned to you in my last comment, but the action is also somewhat verbose and clunky to use because the resource you perform the action on is actually ported in as the destination for the copy. I don't think we'd add another copy method, but I definitely think we could improve the way the existing copy action is used.
For the low-level copy itself, it's
a managed transfer which will perform a multipart copy in multiple threads if necessary.
and customization over s3Transfer, which is why you need to access the meta client to use it. copy_object
is the official s3 API operation, which isn't the most intuitive to use— the s3 transfer methods (and similarly sync
, cp
, etc. in the CLI) are there to make usage of some of the s3 APIs a bit easier.
Hmm, I'm still not understanding the difference.
How is boto3.resource('s3').meta.client
different to boto3.client('s3')
? Aren't they identical?
Are you saying that the difference is that copy
does multi-threaded multi-part copy if necessary, and copy_from
does a single-threaded single-part copy?
Hi @mdavis-xyz,
I thought initially this was a special case where the meta client was needed and that's why it was documented, but that doesn't appear to be the case— seems to work fine on a standard client as well. And yes, the meta client is just a way to access a service's client from a resource instantiation.
Correct, copy_from
is basically S3's copy_object
, which is single-threaded and copy
is the multi-threaded, multi-part copy from s3Transfer.
I can't seem to get either to preserve my metadata. (Specifically, LastModified.)
I too was confused between copy() and copy_object(). I thought the obvious one to use was copy() and started using that one, just to realize that copy_object() is much faster, at least for small files (the situation I tested). In my experience copy_object() is 50% faster for a single process - if you use multiple parallel processes the effect is larger.
which one support copy 'LastModified'