smart_open
smart_open copied to clipboard
Missing `ContentRange` in the response from AWS
Problem description
We run the following code to stream avro files directly from s3. As of a few weeks ago, smart-open complains that the response from AWS is missing the ContentRange key (here).
from fastavro import reader
from smart_open import open as smart_open
import boto3
session = boto3.session.Session()
client = session.client('s3')
transport_params = dict(client=client, session=session)
record_meta = dict(
file_name="s3://bucket-name/filename.avro",
offset=4500,
)
with smart_open(record_meta["file_name"], "rb", transport_params=transport_params) as file:
avro_reader = reader(file)
next(avro_reader)
file.seek(record_meta['offset'])
record = next(avro_reader)
But according to AWS docs, the response body should contain a range. Is this an issue that needs to be raised on the AWS side, or should something be fixed on this end?
Versions
Darwin-20.6.0-x86_64-i386-64bit
Python 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08)
[Clang 6.0 (clang-600.0.57)]
smart_open 4.2.0
More likely to be an AWS issue, as nothing much has changed on our side recently.
Please show a full stack trace, just in case.
Here is the full stack trace @mpenkov:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/r4/6lpv2h9n4_54q6hw1fb5vtvc0000gn/T/ipykernel_2336/2991261431.py in <module>
22 avro_reader = reader(file)
23 next(avro_reader)
---> 24 file.seek(record_meta['offset'])
25 record = next(avro_reader)
~/.virtualenvs/t2p/lib/python3.7/site-packages/smart_open/s3.py in seek(self, offset, whence)
657 offset += self._current_pos
658
--> 659 self._current_pos = self._raw_reader.seek(offset, whence)
660
661 self._buffer.empty()
~/.virtualenvs/t2p/lib/python3.7/site-packages/smart_open/s3.py in seek(self, offset, whence)
380 self._position = self._content_length
381 else:
--> 382 self._open_body(start, stop)
383
384 return self._position
~/.virtualenvs/t2p/lib/python3.7/site-packages/smart_open/s3.py in _open_body(self, start, stop)
426 response['ResponseMetadata']['RetryAttempts'],
427 )
--> 428 units, start, stop, length = smart_open.utils.parse_content_range(response['ContentRange'])
429 self._content_length = length
430 self._position = start
KeyError: 'ContentRange'
Please send the output of pip freeze or whatever the equivalent is on your system. It'd be good to see what versions of relevant packages are installed.
I've also encountered this issue, turns out the range request was modified by ProxySG that turns it into a non-range request, thus smart_open chokes on its response.
I've also tried cloudpathlib and it works fine even in such case.