boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

support memoryview in all places bytes, bytearray are supported

Open alonbl opened this issue 1 year ago • 4 comments

Describe the bug

Currently there are explicit checks for explicit types in boto3, for example:

botocore/validate.py:

    def _validate_blob(self, param, shape, errors, name):
        if isinstance(param, (bytes, bytearray, str)):
            return
        elif hasattr(param, 'read'):
            # File like objects are also allowed for blob types.
            return
        else:
            errors.report(
                name,
                'invalid type',
                param=param,
                valid_types=[str(bytes), str(bytearray), 'file-like object'],
            )

In order to avoid copy memoryview class can be used to wrap bytearray, passing memoryview is compatible with bytearray, however, due to the validation check it fails.

Expected Behavior

Accept memoryview whenever bytes or bytearray are accepted.

Current Behavior

Due to manual checks which were added in good intentions passing memoryview is rejected, as result we cannot avoid copy of large buffers.

Reproduction Steps

>>> boto3.client("s3").upload_part(Bucket="xxxxx", Key="xxxxx", PartNumber=0, UploadId="", Body=memoryview(bytearray(10)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 691, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 739, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/usr/lib/python3/dist-packages/botocore/validate.py", line 360, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: <memory at 0x7f36586613c0>, type: <class 'memoryview'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object

Possible Solution

Modify the following (for example) all over the code:

-if isinstance(param, (bytes, bytearray, str)):
-if isinstance(param, (bytes, bytearray, memoryview, str)):

Additional Information/Context

No response

SDK version used

botocore-1.27.75

Environment details (OS name and version, etc.)

python-3

alonbl avatar Sep 19 '22 16:09 alonbl

Thanks @alonbl for the feature request. I brought this up for discussion with the team and it seemed like something they may be receptive to doing. However more research is required before confirming that decision. We encourage others to 👍 this issue if they are also interested and if there are any more details you can share regarding your use case please let us know.

tim-finnigan avatar Sep 21 '22 17:09 tim-finnigan

Thank you @tim-finnigan,

While this discussion happening, please also consider receiving into buffers like the new method of fh.recv_into() family of functions. Especially in s3 it is important to avoid memory copy for large blobs. But unlike the subject request which trivial this one requires an API change.

alonbl avatar Sep 21 '22 18:09 alonbl

A common use case is uploading a pandas df to AWS s3 without needing s3fs (for instance in corporate env with tight approvals), it's been asked several times on StackOverflow (a few examples 1 2 3).

It's not limited to pandas though, there are many packages whose upload to s3 could be transformed from

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    o.put(f.getvalue())

or

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    f.seek(0)
    o.upload_fileobj(f)

to

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    o.put(f.getbuffer())

which would save a copy or a seek.

ljmc-github avatar Jan 25 '23 18:01 ljmc-github

This would be very useful in https://github.com/piskvorky/smart_open/issues/380, and I opened a PR where I made the suggested changes + added tests #3107. It really does seem to be as easy as changing a couple isinstance checks - all tests passed when I ran them locally.

jakkdl avatar Feb 13 '24 11:02 jakkdl