s3transfer
s3transfer copied to clipboard
upload_file return value
Both PutObject and CompleteMultipartUpload respond with data that includes the VersionId and ETag. [1] [2]
It would be really useful if S3Transfer.upload_file could return this response, or some part of the response.
- [1] http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.complete_multipart_upload
- [2] http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object
Thanks, marking this as a feature enhancement.
looks like this isn't going anywhere. This actually destroys the viability of using s3transfer manager in any case where there could potentially be more than one version uploaded, as one can't guarantee that the data from a subsequent 'head' call refers to the same file -- since s3 is eventually consistent.
That's a pretty bad breakage, rather than just a feature request.
Does anyone know of a workaround, or do we have to resort to not using s3transfer? As far as I can tell it is impossible to determine which version was just uploaded without this due to race conditions with a subsequent HEAD, like @eode mentioned.
A few options I've thought of to work around this:
- Use a unique key that will never be chosen again. E.g. upload to a UUID and then head that object to get the version ID.
- Pass a unique Metadata key and value in ExtraArgs. Verify that when checking for the output version.
- If all you care about is that all files get version IDs copied somewhere you can subscribe to the bucket's S3 events and push file creation/update metadata to an outside data storage system.
I have a workaround for s3transfer.manager.TransferManager, which is what boto3 uses. Monkeypatch the PutObjectTask and CompleteMultipartUploadTask so they actually return the response from the S3 client call. This fixes boto's Bucket.upload_fileobj() and friends.
import s3transfer.upload
import s3transfer.tasks
class PutObjectTask(s3transfer.tasks.Task):
# Copied from s3transfer/upload.py, changed to return the result of client.put_object.
def _main(self, client, fileobj, bucket, key, extra_args):
with fileobj as body:
return client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
class CompleteMultipartUploadTask(s3transfer.tasks.Task):
# Copied from s3transfer/tasks.py, changed to return a result.
def _main(self, client, bucket, key, upload_id, parts, extra_args):
print(f"Multipart upload {upload_id} for {key}.")
return client.complete_multipart_upload(
Bucket=bucket,
Key=key,
UploadId=upload_id,
MultipartUpload={"Parts": parts},
**extra_args,
)
s3transfer.upload.PutObjectTask = PutObjectTask
s3transfer.upload.CompleteMultipartUploadTask = CompleteMultipartUploadTask
What's the status of this? If we put that monkey patch into a pull request, will that fix the problem?