FileNotFoundError: Unable to resolve remote path
Description
Apparently, we can't index an empty S3 bucket, and it produces the following exceptions:
-
FileNotFoundError: Unable to resolve remote path:when running locally. -
PermissionError: No AWSAccessKey was presented.when running from Studio.[^1]
Query
import datachain
datachain.read_storage("s3://example-empty-bucket/").save("index-example-empty-bucket")
Traceback (Local)
Traceback (most recent call last):
File "/.../main.py", line 3, in <module>
datachain.read_storage("s3://example-empty-bucket/").save("index-example-empty-bucket")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../datachain/src/datachain/lib/dc/datachain.py", line 481, in save
query=self._query.save(
name=name,
...<4 lines>...
**kwargs,
)
File "/.../datachain/src/datachain/query/dataset.py", line 1707, in save
query = self.apply_steps()
File "/.../datachain/src/datachain/query/dataset.py", line 1222, in apply_steps
self.listing_fn()
~~~~~~~~~~~~~~~^^
File "/.../datachain/src/datachain/lib/dc/storage.py", line 157, in <lambda>
lambda ds_name=list_ds_name, lst_uri=list_uri: lst_fn(ds_name, lst_uri)
~~~~~~^^^^^^^^^^^^^^^^^^
File "/.../datachain/src/datachain/lib/dc/storage.py", line 153, in lst_fn
.save(ds_name, listing=True, version=version)
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../datachain/src/datachain/lib/dc/datachain.py", line 481, in save
query=self._query.save(
name=name,
...<4 lines>...
**kwargs,
)
File "/.../datachain/src/datachain/query/dataset.py", line 1707, in save
query = self.apply_steps()
File "/.../datachain/src/datachain/query/dataset.py", line 1251, in apply_steps
result = step.apply(
result.query_generator, self.temp_table_names
) # a chain of steps linked by results
File "/.../datachain/src/datachain/query/dataset.py", line 614, in apply
self.populate_udf_table(udf_table, query)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "/.../datachain/src/datachain/query/dataset.py", line 532, in populate_udf_table
process_udf_outputs(
~~~~~~~~~~~~~~~~~~~^
warehouse,
^^^^^^^^^^
...<3 lines>...
cb=generated_cb,
^^^^^^^^^^^^^^^^
)
^
File "/.../datachain/src/datachain/query/dataset.py", line 343, in process_udf_outputs
for row in udf_output:
^^^^^^^^^^
File "/.../datachain/src/datachain/lib/udf.py", line 477, in _process_row
for result_obj in result_objs:
^^^^^^^^^^^
File "/.../datachain/src/datachain/lib/listing.py", line 56, in list_func
for entries in iter_over_async(client.scandir(path.rstrip("/")), get_loop()):
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../datachain/src/datachain/asyn.py", line 280, in iter_over_async
done, obj = asyncio.run_coroutine_threadsafe(get_next(), loop).result()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/.../lib/python3.13/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/.../lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/.../datachain/src/datachain/asyn.py", line 273, in get_next
obj = await ait.__anext__()
^^^^^^^^^^^^^^^^^^^^^
File "/.../datachain/src/datachain/client/fsspec.py", line 247, in scandir
await main_task
File "/.../datachain/src/datachain/client/s3.py", line 133, in _fetch_default
await self._fetch_flat(start_prefix, result_queue)
File "/.../datachain/src/datachain/client/s3.py", line 124, in _fetch_flat
await consumer
File "/.../datachain/src/datachain/client/s3.py", line 98, in process_pages
raise FileNotFoundError(f"Unable to resolve remote path: {prefix}")
FileNotFoundError: Unable to resolve remote path:
Traceback (Studio)
Traceback (most recent call last):
File "/.../site-packages/s3fs/core.py", line 114, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../site-packages/aiobotocore/client.py", line 412, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: No AWSAccessKey was presented.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/.../site-packages/datachain/lib/listing.py", line 144, in _reraise_as_client_error
yield
File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/.../site-packages/datachain/fs/utils.py", line 28, in isfile
return not _isdir(fs, path)
^^^^^^^^^^^^^^^^
File "/.../site-packages/datachain/fs/utils.py", line 10, in _isdir
info = fs.info(path)
^^^^^^^^^^^^^
File "/.../site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/.../site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/.../site-packages/s3fs/core.py", line 1471, in _info
out = await self._call_s3(
^^^^^^^^^^^^^^^^^^^^
File "/.../site-packages/s3fs/core.py", line 371, in _call_s3
return await _error_wrapper(
^^^^^^^^^^^^^^^^^^^^^
File "/.../site-packages/s3fs/core.py", line 146, in _error_wrapper
raise err
PermissionError: No AWSAccessKey was presented.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "/.../site-packages/datachain/lib/dc/storage.py", line 150, in read_storage
list_ds_name, list_uri, list_path, list_ds_exists = get_listing(
^^^^^^^^^^^^
File "/.../site-packages/datachain/lib/listing.py", line 173, in get_listing
if not glob.has_magic(uri) and not uri.endswith("/") and isfile(client.fs, uri):
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 80, in inner
with self._recreate_cm():
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/.../site-packages/datachain/lib/listing.py", line 146, in _reraise_as_client_error
raise ClientError(message=str(e), error_code=getattr(e, "code", None)) from e
datachain.error.ClientError: No AWSAccessKey was presented.
Version Info
0.18.4
Python 3.12.10
[^1]: This one is especially obscure, and I guess it's the result of attempting anonymous (?) access after the first attempt fails with an exception?
Reproduced locally and Studio. Getting the same result:
FileNotFoundError: Unable to resolve remote path
Can it be because I don't use an OpenID connected team (demo-1) 🤔 ?
https://github.com/iterative/datachain/pull/1121/files - fixes the FileNotFoundError
@0x2b3bfa0 where / how did you run it to get the No AWSAccessKey was presented error?
First part of the fix is merged, I'm looking into the Unable to resolve remote path: part
Closing this as we were not able to reproduce the last piece here after all the redeployments. @0x2b3bfa0 feel free to destroy the cluster if it's still running.