datachain icon indicating copy to clipboard operation
datachain copied to clipboard

FileNotFoundError: Unable to resolve remote path

Open 0x2b3bfa0 opened this issue 10 months ago • 3 comments

Description

Apparently, we can't index an empty S3 bucket, and it produces the following exceptions:

  • FileNotFoundError: Unable to resolve remote path: when running locally.
  • PermissionError: No AWSAccessKey was presented. when running from Studio.[^1]

Query

import datachain

datachain.read_storage("s3://example-empty-bucket/").save("index-example-empty-bucket")

Traceback (Local)

Traceback (most recent call last):                                 
  File "/.../main.py", line 3, in <module>
    datachain.read_storage("s3://example-empty-bucket/").save("index-example-empty-bucket")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/dc/datachain.py", line 481, in save
    query=self._query.save(
        name=name,
    ...<4 lines>...
        **kwargs,
    )
  File "/.../datachain/src/datachain/query/dataset.py", line 1707, in save
    query = self.apply_steps()
  File "/.../datachain/src/datachain/query/dataset.py", line 1222, in apply_steps
    self.listing_fn()
    ~~~~~~~~~~~~~~~^^
  File "/.../datachain/src/datachain/lib/dc/storage.py", line 157, in <lambda>
    lambda ds_name=list_ds_name, lst_uri=list_uri: lst_fn(ds_name, lst_uri)
                                                   ~~~~~~^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/dc/storage.py", line 153, in lst_fn
    .save(ds_name, listing=True, version=version)
     ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/dc/datachain.py", line 481, in save
    query=self._query.save(
        name=name,
    ...<4 lines>...
        **kwargs,
    )
  File "/.../datachain/src/datachain/query/dataset.py", line 1707, in save
    query = self.apply_steps()
  File "/.../datachain/src/datachain/query/dataset.py", line 1251, in apply_steps
    result = step.apply(
        result.query_generator, self.temp_table_names
    )  # a chain of steps linked by results
  File "/.../datachain/src/datachain/query/dataset.py", line 614, in apply
    self.populate_udf_table(udf_table, query)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/query/dataset.py", line 532, in populate_udf_table
    process_udf_outputs(
    ~~~~~~~~~~~~~~~~~~~^
        warehouse,
        ^^^^^^^^^^
    ...<3 lines>...
        cb=generated_cb,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/.../datachain/src/datachain/query/dataset.py", line 343, in process_udf_outputs
    for row in udf_output:
               ^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/udf.py", line 477, in _process_row
    for result_obj in result_objs:
                      ^^^^^^^^^^^
  File "/.../datachain/src/datachain/lib/listing.py", line 56, in list_func
    for entries in iter_over_async(client.scandir(path.rstrip("/")), get_loop()):
                   ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/asyn.py", line 280, in iter_over_async
    done, obj = asyncio.run_coroutine_threadsafe(get_next(), loop).result()
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/.../lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/.../lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/.../datachain/src/datachain/asyn.py", line 273, in get_next
    obj = await ait.__anext__()
          ^^^^^^^^^^^^^^^^^^^^^
  File "/.../datachain/src/datachain/client/fsspec.py", line 247, in scandir
    await main_task
  File "/.../datachain/src/datachain/client/s3.py", line 133, in _fetch_default
    await self._fetch_flat(start_prefix, result_queue)
  File "/.../datachain/src/datachain/client/s3.py", line 124, in _fetch_flat
    await consumer
  File "/.../datachain/src/datachain/client/s3.py", line 98, in process_pages
    raise FileNotFoundError(f"Unable to resolve remote path: {prefix}")
FileNotFoundError: Unable to resolve remote path: 

Traceback (Studio)

Traceback (most recent call last):
  File "/.../site-packages/s3fs/core.py", line 114, in _error_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/aiobotocore/client.py", line 412, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: No AWSAccessKey was presented.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/.../site-packages/datachain/lib/listing.py", line 144, in _reraise_as_client_error
    yield
  File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/datachain/fs/utils.py", line 28, in isfile
    return not _isdir(fs, path)
               ^^^^^^^^^^^^^^^^
  File "/.../site-packages/datachain/fs/utils.py", line 10, in _isdir
    info = fs.info(path)
           ^^^^^^^^^^^^^
  File "/.../site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/.../site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/.../site-packages/s3fs/core.py", line 1471, in _info
    out = await self._call_s3(
          ^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/s3fs/core.py", line 371, in _call_s3
    return await _error_wrapper(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.../site-packages/s3fs/core.py", line 146, in _error_wrapper
    raise err
PermissionError: No AWSAccessKey was presented.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "/.../site-packages/datachain/lib/dc/storage.py", line 150, in read_storage
    list_ds_name, list_uri, list_path, list_ds_exists = get_listing(
                                                        ^^^^^^^^^^^^
  File "/.../site-packages/datachain/lib/listing.py", line 173, in get_listing
    if not glob.has_magic(uri) and not uri.endswith("/") and isfile(client.fs, uri):
                                                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 80, in inner
    with self._recreate_cm():
         ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/.../site-packages/datachain/lib/listing.py", line 146, in _reraise_as_client_error
    raise ClientError(message=str(e), error_code=getattr(e, "code", None)) from e
datachain.error.ClientError: No AWSAccessKey was presented.

Version Info

0.18.4
Python 3.12.10

[^1]: This one is especially obscure, and I guess it's the result of attempting anonymous (?) access after the first attempt fails with an exception?

0x2b3bfa0 avatar May 27 '25 00:05 0x2b3bfa0

Reproduced locally and Studio. Getting the same result:

FileNotFoundError: Unable to resolve remote path

Can it be because I don't use an OpenID connected team (demo-1) 🤔 ?

shcheklein avatar May 27 '25 02:05 shcheklein

https://github.com/iterative/datachain/pull/1121/files - fixes the FileNotFoundError

@0x2b3bfa0 where / how did you run it to get the No AWSAccessKey was presented error?

shcheklein avatar May 28 '25 00:05 shcheklein

First part of the fix is merged, I'm looking into the Unable to resolve remote path: part

shcheklein avatar May 28 '25 16:05 shcheklein

Closing this as we were not able to reproduce the last piece here after all the redeployments. @0x2b3bfa0 feel free to destroy the cluster if it's still running.

shcheklein avatar May 30 '25 20:05 shcheklein