smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

Missing `ContentRange` in the response from AWS

Open salmanmashayekh opened this issue 3 years ago • 3 comments
trafficstars

Problem description

We run the following code to stream avro files directly from s3. As of a few weeks ago, smart-open complains that the response from AWS is missing the ContentRange key (here).

from fastavro import reader
from smart_open import open as smart_open
import boto3

session = boto3.session.Session()
client = session.client('s3')
transport_params = dict(client=client, session=session)

record_meta = dict(
    file_name="s3://bucket-name/filename.avro",
    offset=4500,
)

with smart_open(record_meta["file_name"], "rb", transport_params=transport_params) as file:
    avro_reader = reader(file)
    next(avro_reader)
    file.seek(record_meta['offset'])
    record = next(avro_reader)

But according to AWS docs, the response body should contain a range. Is this an issue that needs to be raised on the AWS side, or should something be fixed on this end?

Versions

Darwin-20.6.0-x86_64-i386-64bit
Python 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08) 
[Clang 6.0 (clang-600.0.57)]
smart_open 4.2.0

salmanmashayekh avatar Jan 28 '22 22:01 salmanmashayekh

More likely to be an AWS issue, as nothing much has changed on our side recently.

Please show a full stack trace, just in case.

mpenkov avatar Jan 29 '22 01:01 mpenkov

Here is the full stack trace @mpenkov:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/r4/6lpv2h9n4_54q6hw1fb5vtvc0000gn/T/ipykernel_2336/2991261431.py in <module>
     22     avro_reader = reader(file)
     23     next(avro_reader)
---> 24     file.seek(record_meta['offset'])
     25     record = next(avro_reader)

~/.virtualenvs/t2p/lib/python3.7/site-packages/smart_open/s3.py in seek(self, offset, whence)
    657             offset += self._current_pos
    658 
--> 659         self._current_pos = self._raw_reader.seek(offset, whence)
    660 
    661         self._buffer.empty()

~/.virtualenvs/t2p/lib/python3.7/site-packages/smart_open/s3.py in seek(self, offset, whence)
    380             self._position = self._content_length
    381         else:
--> 382             self._open_body(start, stop)
    383 
    384         return self._position

~/.virtualenvs/t2p/lib/python3.7/site-packages/smart_open/s3.py in _open_body(self, start, stop)
    426                 response['ResponseMetadata']['RetryAttempts'],
    427             )
--> 428             units, start, stop, length = smart_open.utils.parse_content_range(response['ContentRange'])
    429             self._content_length = length
    430             self._position = start

KeyError: 'ContentRange'

salmanmashayekh avatar Jan 31 '22 17:01 salmanmashayekh

Please send the output of pip freeze or whatever the equivalent is on your system. It'd be good to see what versions of relevant packages are installed.

mpenkov avatar Feb 01 '22 07:02 mpenkov

I've also encountered this issue, turns out the range request was modified by ProxySG that turns it into a non-range request, thus smart_open chokes on its response.

I've also tried cloudpathlib and it works fine even in such case.

messense avatar Nov 03 '23 06:11 messense