aws-sdk-go icon indicating copy to clipboard operation
aws-sdk-go copied to clipboard

service/s3/s3manager: implements multipart copy as s3manager.Copier

Open roberth-k opened this issue 5 years ago • 25 comments

Adds s3manager.Copier, which implements multipart copy using UploadPartCopy for objects larger than 10MB (by default). The interface and conventions mimic those of s3manager.Downloader and s3manager.Uploader, and there's some reuse of the error types and constants of the uploader. The Copier re-uses the input and output of CopyObject.

Parts are sized such that they are as large as possible whilst utilising all concurrent goroutines (the default concurrency is 10). This minimises the number of round trips.

The Copier does not, currently, support CopySourceIf* headers, as my understanding is that this can only be used for parts and not the full upload. Consequently the Copier will always overwrite the destination object even if it is identical. This could potentially be implemented by invoking HeadObject against the destination object, assuming there's a way for the caller to signal that s3:GetObject is permitted.

Completes https://github.com/aws/aws-sdk-go/issues/745.

roberth-k avatar Jun 16 '19 19:06 roberth-k

Thanks for taking the time the to create this PR and all of its testing cases @roberth-k. Having the multipart copier is a feature we could add, but I want to call out that the object copy APIs will not copy the S3 Object's metadata, permissions, tags or other attributes. The APIs only copy the Object's data. All metadata, permissions, tags, etc must be reapplied separately. S3 does not currently have an API to copy the different kinds of metadata on an Object.

The AWS CLI does attempt to copy metadata, permissions, tags, etc, but has had unexpected loss of data due to new S3 features added and users with out of date CLI. The CLI utility is only able to copy metadata that it knows about.

If the SDK's CopyObject utility clearly states the limitations of using the utility to not copy object metadata the having the utility in the SDK is good. We'll review this PR and get back with our comments.

jasdel avatar Sep 11 '19 21:09 jasdel

We would love to see this merged. Being able to copy large (> 5GB) files from one S3 location to another is a critical feature for our application.

jonyoder avatar Oct 09 '19 22:10 jonyoder

@jasdel @jonyoder Thank you for the comments. I've addressed the context leak, and provided an explicit mention about the lack of attribute-copying.

roberth-k avatar Oct 13 '19 17:10 roberth-k

Thanks again for making these updates @roberth-k we'll review the PR and get back to you with feedback.

jasdel avatar Oct 22 '19 18:10 jasdel

FYI, we've been using this code (ported it to our application manually), and it worked really well! Looking forward to seeing this merged into the s3 manager.

jonyoder avatar Jan 09 '20 14:01 jonyoder

Are there any updates on this PR?

igungor avatar Feb 28 '20 10:02 igungor

This might have inconsistent behavior in regards to metadata copying.

It looks like CopyObject will by default copy the metadata for files small enough that it will be used. While for larger files, where a multipart upload using UploadPartCopy is used, no metadata will be copied.

segevfiner avatar Apr 13 '20 16:04 segevfiner

I hit an issue trying to use this code to copy between buckets that are in different accounts and regions. The code calls HeadObject to get the source object size using the passed S3 client/session, which uses the region of the destination bucket, which causes the request to fail with a 403 error.

Any ideas?

segevfiner avatar Apr 21 '20 15:04 segevfiner

@segevfiner

It looks like CopyObject will by default copy the metadata for files small enough that it will be used. While for larger files, where a multipart upload using UploadPartCopy is used, no metadata will be copied.

The metadata copy would indeed be inconsistent; the fix I suppose will be to set x-amz-metadata-directive of the CopyObject call to REPLACE to mimic multipart behaviour.

I hit an issue trying to use this code to copy between buckets that are in different accounts and regions. The code calls HeadObject to get the source object size using the passed S3 client/session, which uses the region of the destination bucket, which causes the request to fail with a 403 error.

HeadObject is necessary to check for the object's eligibility for multipart copy as well as to compute part offsets, so making the call I think is unavoidable.

The location of the source bucket could be obtained with GetBucketLocation, but per API documentation that's limited to the bucket owner. As the copy must be executed from the destination account, this isn't an applicable solution.

The AWS CLI solves this by passing an explicit --source-bucket. An analogous solution here would require forking the CopyObjectInput similar to what's been done with the Uploader. It's not ideal, as overriding the region for this request would mean either (a) overriding the Request, if that's effective; or (b) type matching the s3iface.S3API.

I can try these out in a few days.

roberth-k avatar Apr 22 '20 10:04 roberth-k

Check out s3manager.GetBucketRegion which is working for me with a bucket that I do not own in a different account. Of course, not sure about any undocumented limitations or situations it doesn't work.

segevfiner avatar Apr 22 '20 10:04 segevfiner

Thanks @segevfiner -- it's a good find.

HeadBucket's documentation does suggest it requires s3:ListBucket, but experimentally it turns out that the x-amz-bucket-region response header is provided even when the request is a 403, the bucket in another account and region, and the requestor an IAM entity with no permissions whatsoever. I suppose this could be undefined behaviour.

roberth-k avatar Apr 22 '20 10:04 roberth-k

I tried the latest code and it looks like it is still not working in the cross region case. It does seem to detect the source bucket region, but it is still sending the HeadObject request to the s3 endpoint of the AWS session region (The region of the destination), so I'm getting a BadRequest.

segevfiner avatar Jun 08 '20 18:06 segevfiner

@segevfiner That's correct. The issue is that setting region as an option to the HeadObject request isn't sufficient, as the endpoints of the S3 client are set at construction time. There isn't a native way of copy-constructing a S3 client, and I've yet to find to find a good (i.e. portable) workaround.

roberth-k avatar Jun 08 '20 18:06 roberth-k

Thanks for this. Looking forward to this merge.

My application also needs to support copy for files >5GB.

I also have been using the code in this PR and everything is working as intended for me :)

andrerfcsantos avatar Jun 23 '20 18:06 andrerfcsantos

With the latest changes the multipart copy does respect cross-region, but only when the embedded s3iface.S3API is an instance of *s3.S3.

roberth-k avatar Jul 07 '20 08:07 roberth-k

Hi, Just wanted to nudge this issue, I'm finding myself in the need of this aswell.

Djolivald avatar Sep 14 '20 19:09 Djolivald

After reading all comments it is not clear to me does this method preserve or set files metadata? Does anyone has any examples of usage?

Kif11 avatar Oct 10 '20 00:10 Kif11

I think the current version copies metadata by default (Even in the multipart case, with extra code added here to support that), and it can be controlled by the MetadataDirective option similar to the plain CopyObject API. But still, please check to make sure.

segevfiner avatar Oct 10 '20 07:10 segevfiner

Just shedding some light on this issue. Are there any news on this? Any major blockers preventing this from being merged?

I'm using this PR in a project that needs this and would be nice if this were to be merged. That way we could benefit from all the updates on master alongside this one.

Let me know if I can help on tests or if I can do anything else that can help this issue be merged faster.

andrerfcsantos avatar Nov 05 '20 11:11 andrerfcsantos

Is it possible to get this merged?

AnthonyMBonafide avatar Mar 23 '21 18:03 AnthonyMBonafide

+1 for getting this merged

gusostow avatar Oct 06 '21 15:10 gusostow

This PR would make things a lot easier :tada:

RaphaelPour avatar Nov 08 '21 14:11 RaphaelPour

+1 on working to get this merged! We've been using the code for years!

jonyoder avatar May 16 '22 14:05 jonyoder

Any updates on getting this PR merged?

boraberke avatar Aug 22 '22 12:08 boraberke

who gon merg dis?

tooptoop4 avatar Sep 14 '22 13:09 tooptoop4

How can we get this PR merged?

toddleish avatar Apr 10 '23 17:04 toddleish

Any updates with this?

rlizzo avatar May 11 '23 16:05 rlizzo

Hello Everyone,

We have noticed the significant support and engagement for the PR, with many upvotes and comments. However, after careful consideration, we have decided not to proceed with this feature.

The SDK teams are currently engaged in an org-wide effort to redefine the transfer manager experience, in order to create a more consistent cross-SDK experience and desired feature set.

As AWS SDK for Go v2 is the latest and recommended version, we have created a corresponding feature request there. We invite you all to participate by upvoting and commenting on the feature request in the v2 repository. Your involvement will help prioritize this feature.

Thank you for your understanding and continued support. Ran & the Go SDK team

RanVaknin avatar Jun 22 '23 19:06 RanVaknin

Hello Everyone,

We have noticed the significant support and engagement for the PR, with many upvotes and comments. However, after careful consideration, we have decided not to proceed with this feature.

The SDK teams are currently engaged in an org-wide effort to redefine the transfer manager experience, in order to create a more consistent cross-SDK experience and desired feature set.

As AWS SDK for Go v2 is the latest and recommended version, we have created a corresponding feature request there. We invite you all to participate by upvoting and commenting on the feature request in the v2 repository. Your involvement will help prioritize this feature.

Thank you for your understanding and continued support. Ran & the Go SDK team

This is... frustrating.

rlizzo avatar Jun 22 '23 19:06 rlizzo