boto3
boto3 copied to clipboard
support memoryview in all places bytes, bytearray are supported
Describe the bug
Currently there are explicit checks for explicit types in boto3, for example:
botocore/validate.py:
def _validate_blob(self, param, shape, errors, name):
if isinstance(param, (bytes, bytearray, str)):
return
elif hasattr(param, 'read'):
# File like objects are also allowed for blob types.
return
else:
errors.report(
name,
'invalid type',
param=param,
valid_types=[str(bytes), str(bytearray), 'file-like object'],
)
In order to avoid copy memoryview class can be used to wrap bytearray, passing memoryview is compatible with bytearray, however, due to the validation check it fails.
Expected Behavior
Accept memoryview whenever bytes or bytearray are accepted.
Current Behavior
Due to manual checks which were added in good intentions passing memoryview is rejected, as result we cannot avoid copy of large buffers.
Reproduction Steps
>>> boto3.client("s3").upload_part(Bucket="xxxxx", Key="xxxxx", PartNumber=0, UploadId="", Body=memoryview(bytearray(10)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/botocore/client.py", line 391, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python3/dist-packages/botocore/client.py", line 691, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/usr/lib/python3/dist-packages/botocore/client.py", line 739, in _convert_to_request_dict
request_dict = self._serializer.serialize_to_request(
File "/usr/lib/python3/dist-packages/botocore/validate.py", line 360, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: <memory at 0x7f36586613c0>, type: <class 'memoryview'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object
Possible Solution
Modify the following (for example) all over the code:
-if isinstance(param, (bytes, bytearray, str)):
-if isinstance(param, (bytes, bytearray, memoryview, str)):
Additional Information/Context
No response
SDK version used
botocore-1.27.75
Environment details (OS name and version, etc.)
python-3
Thanks @alonbl for the feature request. I brought this up for discussion with the team and it seemed like something they may be receptive to doing. However more research is required before confirming that decision. We encourage others to 👍 this issue if they are also interested and if there are any more details you can share regarding your use case please let us know.
Thank you @tim-finnigan,
While this discussion happening, please also consider receiving into buffers like the new method of fh.recv_into() family of functions. Especially in s3 it is important to avoid memory copy for large blobs. But unlike the subject request which trivial this one requires an API change.
A common use case is uploading a pandas df to AWS s3 without needing s3fs (for instance in corporate env with tight approvals), it's been asked several times on StackOverflow (a few examples 1 2 3).
It's not limited to pandas though, there are many packages whose upload to s3 could be transformed from
o = s3.Object("bucket", "key")
with BytesIO() as f:
df.to_csv(f)
o.put(f.getvalue())
or
o = s3.Object("bucket", "key")
with BytesIO() as f:
df.to_csv(f)
f.seek(0)
o.upload_fileobj(f)
to
o = s3.Object("bucket", "key")
with BytesIO() as f:
df.to_csv(f)
o.put(f.getbuffer())
which would save a copy or a seek.
This would be very useful in https://github.com/piskvorky/smart_open/issues/380, and I opened a PR where I made the suggested changes + added tests #3107. It really does seem to be as easy as changing a couple isinstance checks - all tests passed when I ran them locally.