smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

Allow pass in additional low level API keyword arguments to AWS S3 boto3 put_object

Open MacHu-GWU opened this issue 2 years ago • 1 comments

Problem description

Be sure your description clearly answers the following questions:

  • What are you trying to achieve?

    currently, the open function calls the boto3.s3_client.put_object low level api and only Bucket, Key, Body parameters are used. I am trying to pass in additional keyword arguments for put_object method.

  • What is the expected result?

    I expect to see an arguments like this:

    with smart_open.open("s3://bucket/file.txt", "w", low_level_kwargs=dict(Metadata=dict(owner="[email protected]"))) as f:
        f.write("hello world")
    

    You can find more additional arguments at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_object

  • What is your suggestion?

    I think you could add an optional low_level_kwargs dict parameter to your open high level API. For some compatible backends like S3, file system.

Steps/code to reproduce the problem

See What is the expected result section

Versions

I don't think it does matter in this use case, but let's do it:

Please provide the output of:

import platform, sys, smart_open
print(platform.platform()) = MacOS
print("Python", sys.version) = 3.8.11 
print("smart_open", smart_open.__version__) == 5.2.4

MacHu-GWU avatar Jul 12 '22 19:07 MacHu-GWU

a top-level keyword argument is overkill (we want to keep the signature of the open function simple).

Instead, we could pass them in the client kwargs dict, as happens here: https://github.com/RaRe-Technologies/smart_open/blob/develop/howto.md#how-to-specify-the-request-payer-s3-only

Let me know if you're interested in making a PR.

mpenkov avatar Jul 29 '22 06:07 mpenkov

I believe the functionality requested already exists, as you could specify the low level arguments using the following: params = {'client_kwargs': {'S3.Client.put_object': {owner : "[email protected]"}}}

The get_attr function within s3.py will pass in these low level arguments when calling specific s3 functions like put_object.

I was wondering if this issue should be closed?

RachitSharma2001 avatar Nov 01 '22 18:11 RachitSharma2001

Yes, I think so.

mpenkov avatar Nov 03 '22 13:11 mpenkov

@RachitSharma2001 @mpenkov actually it is not implemented yet.

Please take a look at the source code: https://github.com/RaRe-Technologies/smart_open/blob/develop/smart_open/s3.py#L1003

    def close(self):
        if self._buf is None:
            return

        self._buf.seek(0)

        try:
            self._client.put_object(
                Bucket=self._bucket,
                Key=self._key,
                Body=self._buf,
            )
        except botocore.client.ClientError as e:
            raise ValueError(
                'the bucket %r does not exist, or is forbidden for access' % self._bucket) from e

        logger.debug("%s: direct upload finished", self)
        self._buf = None

params = {'client_kwargs': {'S3.Client.put_object': {owner : "[email protected]"}}}

the client_kwargs is not used at all

MacHu-GWU avatar Nov 13 '22 18:11 MacHu-GWU