smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

Reading a 0 B file from s3 raises KeyError

Open Domcikas opened this issue 3 years ago • 2 comments

Problem description

A problem is somewhat similar to the one described here https://github.com/RaRe-Technologies/smart_open/issues/548 , though the Error is not the same.

Be sure your description clearly answers the following questions:

  • What are you trying to achieve? I'm trying to read the file that might be empty in S3.
  • What is the expected result? The file is read without exceptions.
  • What are you seeing instead? KeyError exception is thrown

Steps/code to reproduce the problem

  • Have an empty file in S3
  • Run the following code
from smart_open import open
with open('S3_uri', 'rb') as file:
    file.read()

Traceback:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 330, in _get
    return client.get_object(Bucket=bucket, Key=key, Range=range_string)
  File "/home/user/.local/lib/python3.8/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidRange) when calling the GetObject operation: The requested range is not satisfiable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 438, in _open_body
    response = _get(
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 338, in _get
    raise wrapped_error from error
OSError: unable to access bucket: 'mybucket' key: 'existing_file' version: None error: An error occurred (InvalidRange) when calling the GetObject operation: The requested range is not satisfiable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/smart_open_lib.py", line 235, in open
    binary = _open_binary_stream(uri, binary_mode, transport_params)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/smart_open_lib.py", line 398, in _open_binary_stream
    fobj = submodule.open_uri(uri, mode, transport_params)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 224, in open_uri
    return open(parsed_uri['bucket_id'], parsed_uri['key_id'], mode, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 291, in open
    fileobj = Reader(
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 574, in __init__
    self.seek(0)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 666, in seek
    self._current_pos = self._raw_reader.seek(offset, whence)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 417, in seek
    self._open_body(start, stop)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 450, in _open_body
    self._position = self._content_length = int(error_response['ActualObjectSize'])
KeyError: 'ActualObjectSize'

Versions

Please provide the output of:

smart_open 5.2.1

Checklist

Before you create the issue, please make sure you have:

  • [X] Described the problem clearly
  • [X] Provided a minimal reproducible example, including any required data
  • [X] Provided the version numbers of the relevant software

Domcikas avatar Nov 09 '21 14:11 Domcikas

I am seeing the same issue with reading attempting to open a 0B file.

This PR was supposedly merged to fix this issue, but it actually introduces the missing KeyError reported above.

The expected key 'ActualObjectSize' cannot be found on botocore.exceptions.ClientError which is the wrapped error that gets returned from the boto3.client.get_object call.

I propose that instead of trying to get 'ActualObjectSize' from the wrapped error object, we instead get the content length by making a get_object call without the range_string if there is an InvalidRange error:

self._position = self._content_length = self._client.get_object(Bucket=self._bucket, Key=self._key)["ContentLength"]

sungwy avatar Jan 25 '22 15:01 sungwy

Do we need to make an additional call? If yes, then I'd rather avoid doing unless it's absolutely necessary.

Are you interested in making a PR?

mpenkov avatar Jan 26 '22 00:01 mpenkov

I am still facing "ClientError: An error occurred (416) when calling the GetObject operation: Requested Range Not Satisfiable" error with latest version 6.2.0 for files with 0 bytes. Even though it is supposed to be fixed in #548

gmichaeljaison avatar Oct 03 '22 22:10 gmichaeljaison

I created a PR calling get_object only when we get a KeyError when accessing ActualObjectSize.

This way it should limit unnecessary HTTP call.

Darkheir avatar May 31 '23 13:05 Darkheir