s3transfer icon indicating copy to clipboard operation
s3transfer copied to clipboard

upload_file return value

Open nathan-muir opened this issue 8 years ago • 6 comments

Both PutObject and CompleteMultipartUpload respond with data that includes the VersionId and ETag. [1] [2]

It would be really useful if S3Transfer.upload_file could return this response, or some part of the response.

  • [1] http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.complete_multipart_upload
  • [2] http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object

nathan-muir avatar Feb 09 '17 06:02 nathan-muir

Thanks, marking this as a feature enhancement.

dstufft avatar Feb 14 '17 01:02 dstufft

looks like this isn't going anywhere. This actually destroys the viability of using s3transfer manager in any case where there could potentially be more than one version uploaded, as one can't guarantee that the data from a subsequent 'head' call refers to the same file -- since s3 is eventually consistent.

That's a pretty bad breakage, rather than just a feature request.

eode avatar Dec 04 '18 01:12 eode

Does anyone know of a workaround, or do we have to resort to not using s3transfer? As far as I can tell it is impossible to determine which version was just uploaded without this due to race conditions with a subsequent HEAD, like @eode mentioned.

isobit avatar May 28 '19 22:05 isobit

A few options I've thought of to work around this:

  1. Use a unique key that will never be chosen again. E.g. upload to a UUID and then head that object to get the version ID.
  2. Pass a unique Metadata key and value in ExtraArgs. Verify that when checking for the output version.
  3. If all you care about is that all files get version IDs copied somewhere you can subscribe to the bucket's S3 events and push file creation/update metadata to an outside data storage system.

benmanns avatar Jan 24 '20 22:01 benmanns

I have a workaround for s3transfer.manager.TransferManager, which is what boto3 uses. Monkeypatch the PutObjectTask and CompleteMultipartUploadTask so they actually return the response from the S3 client call. This fixes boto's Bucket.upload_fileobj() and friends.

import s3transfer.upload
import s3transfer.tasks


class PutObjectTask(s3transfer.tasks.Task):
    # Copied from s3transfer/upload.py, changed to return the result of client.put_object.
    def _main(self, client, fileobj, bucket, key, extra_args):
        with fileobj as body:
            return client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)


class CompleteMultipartUploadTask(s3transfer.tasks.Task):
    # Copied from s3transfer/tasks.py, changed to return a result.
    def _main(self, client, bucket, key, upload_id, parts, extra_args):
        print(f"Multipart upload {upload_id} for {key}.")
        return client.complete_multipart_upload(
            Bucket=bucket,
            Key=key,
            UploadId=upload_id,
            MultipartUpload={"Parts": parts},
            **extra_args,
        )


s3transfer.upload.PutObjectTask = PutObjectTask
s3transfer.upload.CompleteMultipartUploadTask = CompleteMultipartUploadTask

toojays avatar May 11 '21 07:05 toojays

What's the status of this? If we put that monkey patch into a pull request, will that fix the problem?

mdavis-xyz avatar Feb 20 '22 21:02 mdavis-xyz