filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

fsspec fails trying to create a bucket when writing to S3 when folder/prefix doesn't exists

Open juarezr opened this issue 3 years ago • 9 comments

fsspec fails trying to create a bucket when writing to S3 when folder/prefix doesn't exists

Problem

By default when writing with fsspec to remote filesystems fsspec sets the flag auto_mkdir=True for creating the path hirerachy.

I found that #212 deprecates auto_mkdir=True for LocalFileSystem.

However this behaviour causes unexpected throubles and should be disabled, IMHO.

For instance, when writing to S3 using fsspec and s3fs if the folder doesn't exist yet, it tries to create the bucket and fails. If passing auto_mkdir=False it works as expected.

Notice that S3 automatically creates folders when writing files if they are missing.

Test Case

The following code reproduces the problem:

import fsspec

s3_path = 's3://my-bucket/path/to/folder/that/not/exists/yet/test.txt'

with fsspec.open(s3_path, mode='wb', compression='infer', **self.kwargs) as fs:
    f.write('Some text.\n')
    f.write('More text.\n') # data is flushed and file closed

This raises a exception like the following:

Traceback (most recent call last):
  File "/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 979, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/venv/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/venv/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/examples/s3sfs_debug.py", line 146, in _test_fsjuarz_auto_mkdir_true
    with fsspec.open(s3_path, mode='w', compression='infer', auto_mkdir=True) as fs:
  File "/venv/lib/python3.7/site-packages/fsspec/core.py", line 399, in open
    **kwargs
  File "/venv/lib/python3.7/site-packages/fsspec/core.py", line 254, in open_files
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/venv/lib/python3.7/site-packages/fsspec/core.py", line 254, in <listcomp>
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/venv/lib/python3.7/site-packages/s3fs/core.py", line 460, in makedirs
    self.mkdir(path, create_parents=True)
  File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 100, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 80, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 51, in sync
    raise exc.with_traceback(tb)
  File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 35, in f
    result[0] = await future
  File "/venv/lib/python3.7/site-packages/s3fs/core.py", line 450, in _mkdir
    raise translate_boto_error(e) from e
FileExistsError: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.

This happens because of the parameter auto_mkdir=True on this code:


def open_files(
    urlpath,
    mode="rb",
    compression=None,
    encoding="utf8",
    errors=None,
    name_function=None,
    num=1,
    protocol=None,
    newline=None,
    auto_mkdir=True,
    **kwargs
):

juarezr avatar Sep 03 '20 22:09 juarezr