aws-sdk-go
aws-sdk-go copied to clipboard
service/s3/s3manager: implements multipart copy as s3manager.Copier
Adds s3manager.Copier
, which implements multipart copy using UploadPartCopy for objects larger than 10MB (by default). The interface and conventions mimic those of s3manager.Downloader
and s3manager.Uploader
, and there's some reuse of the error types and constants of the uploader. The Copier re-uses the input and output of CopyObject.
Parts are sized such that they are as large as possible whilst utilising all concurrent goroutines (the default concurrency is 10). This minimises the number of round trips.
The Copier does not, currently, support CopySourceIf* headers, as my understanding is that this can only be used for parts and not the full upload. Consequently the Copier will always overwrite the destination object even if it is identical. This could potentially be implemented by invoking HeadObject against the destination object, assuming there's a way for the caller to signal that s3:GetObject is permitted.
Completes https://github.com/aws/aws-sdk-go/issues/745.
Thanks for taking the time the to create this PR and all of its testing cases @roberth-k. Having the multipart copier is a feature we could add, but I want to call out that the object copy APIs will not copy the S3 Object's metadata, permissions, tags or other attributes. The APIs only copy the Object's data. All metadata, permissions, tags, etc must be reapplied separately. S3 does not currently have an API to copy the different kinds of metadata on an Object.
The AWS CLI does attempt to copy metadata, permissions, tags, etc, but has had unexpected loss of data due to new S3 features added and users with out of date CLI. The CLI utility is only able to copy metadata that it knows about.
If the SDK's CopyObject utility clearly states the limitations of using the utility to not copy object metadata the having the utility in the SDK is good. We'll review this PR and get back with our comments.
We would love to see this merged. Being able to copy large (> 5GB) files from one S3 location to another is a critical feature for our application.
@jasdel @jonyoder Thank you for the comments. I've addressed the context leak, and provided an explicit mention about the lack of attribute-copying.
Thanks again for making these updates @roberth-k we'll review the PR and get back to you with feedback.
FYI, we've been using this code (ported it to our application manually), and it worked really well! Looking forward to seeing this merged into the s3 manager.
Are there any updates on this PR?
This might have inconsistent behavior in regards to metadata copying.
It looks like CopyObject
will by default copy the metadata for files small enough that it will be used. While for larger files, where a multipart upload using UploadPartCopy
is used, no metadata will be copied.
I hit an issue trying to use this code to copy between buckets that are in different accounts and regions. The code calls HeadObject
to get the source object size using the passed S3 client/session, which uses the region of the destination bucket, which causes the request to fail with a 403 error.
Any ideas?
@segevfiner
It looks like CopyObject will by default copy the metadata for files small enough that it will be used. While for larger files, where a multipart upload using UploadPartCopy is used, no metadata will be copied.
The metadata copy would indeed be inconsistent; the fix I suppose will be to set x-amz-metadata-directive
of the CopyObject call to REPLACE to mimic multipart behaviour.
I hit an issue trying to use this code to copy between buckets that are in different accounts and regions. The code calls HeadObject to get the source object size using the passed S3 client/session, which uses the region of the destination bucket, which causes the request to fail with a 403 error.
HeadObject is necessary to check for the object's eligibility for multipart copy as well as to compute part offsets, so making the call I think is unavoidable.
The location of the source bucket could be obtained with GetBucketLocation, but per API documentation that's limited to the bucket owner. As the copy must be executed from the destination account, this isn't an applicable solution.
The AWS CLI solves this by passing an explicit --source-bucket
. An analogous solution here would require forking the CopyObjectInput similar to what's been done with the Uploader. It's not ideal, as overriding the region for this request would mean either (a) overriding the Request, if that's effective; or (b) type matching the s3iface.S3API
.
I can try these out in a few days.
Check out s3manager.GetBucketRegion which is working for me with a bucket that I do not own in a different account. Of course, not sure about any undocumented limitations or situations it doesn't work.
Thanks @segevfiner -- it's a good find.
HeadBucket's documentation does suggest it requires s3:ListBucket, but experimentally it turns out that the x-amz-bucket-region
response header is provided even when the request is a 403, the bucket in another account and region, and the requestor an IAM entity with no permissions whatsoever. I suppose this could be undefined behaviour.
I tried the latest code and it looks like it is still not working in the cross region case. It does seem to detect the source bucket region, but it is still sending the HeadObject
request to the s3 endpoint of the AWS session region (The region of the destination), so I'm getting a BadRequest
.
@segevfiner That's correct. The issue is that setting region as an option to the HeadObject request isn't sufficient, as the endpoints of the S3 client are set at construction time. There isn't a native way of copy-constructing a S3 client, and I've yet to find to find a good (i.e. portable) workaround.
Thanks for this. Looking forward to this merge.
My application also needs to support copy for files >5GB.
I also have been using the code in this PR and everything is working as intended for me :)
With the latest changes the multipart copy does respect cross-region, but only when the embedded s3iface.S3API
is an instance of *s3.S3
.
Hi, Just wanted to nudge this issue, I'm finding myself in the need of this aswell.
After reading all comments it is not clear to me does this method preserve or set files metadata? Does anyone has any examples of usage?
I think the current version copies metadata by default (Even in the multipart case, with extra code added here to support that), and it can be controlled by the MetadataDirective
option similar to the plain CopyObject
API. But still, please check to make sure.
Just shedding some light on this issue. Are there any news on this? Any major blockers preventing this from being merged?
I'm using this PR in a project that needs this and would be nice if this were to be merged. That way we could benefit from all the updates on master alongside this one.
Let me know if I can help on tests or if I can do anything else that can help this issue be merged faster.
Is it possible to get this merged?
+1 for getting this merged
This PR would make things a lot easier :tada:
+1 on working to get this merged! We've been using the code for years!
Any updates on getting this PR merged?
who gon merg dis?
How can we get this PR merged?
Any updates with this?
Hello Everyone,
We have noticed the significant support and engagement for the PR, with many upvotes and comments. However, after careful consideration, we have decided not to proceed with this feature.
The SDK teams are currently engaged in an org-wide effort to redefine the transfer manager experience, in order to create a more consistent cross-SDK experience and desired feature set.
As AWS SDK for Go v2 is the latest and recommended version, we have created a corresponding feature request there. We invite you all to participate by upvoting and commenting on the feature request in the v2 repository. Your involvement will help prioritize this feature.
Thank you for your understanding and continued support. Ran & the Go SDK team
Hello Everyone,
We have noticed the significant support and engagement for the PR, with many upvotes and comments. However, after careful consideration, we have decided not to proceed with this feature.
The SDK teams are currently engaged in an org-wide effort to redefine the transfer manager experience, in order to create a more consistent cross-SDK experience and desired feature set.
As AWS SDK for Go v2 is the latest and recommended version, we have created a corresponding feature request there. We invite you all to participate by upvoting and commenting on the feature request in the v2 repository. Your involvement will help prioritize this feature.
Thank you for your understanding and continued support. Ran & the Go SDK team
This is... frustrating.