filesystem_spec
filesystem_spec copied to clipboard
fsspec fails trying to create a bucket when writing to S3 when folder/prefix doesn't exists
fsspec fails trying to create a bucket when writing to S3 when folder/prefix doesn't exists
Problem
By default when writing with fsspec to remote filesystems fsspec
sets the flag auto_mkdir=True
for creating the path hirerachy.
I found that #212 deprecates auto_mkdir=True
for LocalFileSystem
.
However this behaviour causes unexpected throubles and should be disabled, IMHO.
For instance, when writing to S3 using fsspec
and s3fs
if the folder doesn't exist yet, it tries to create the bucket and fails. If passing auto_mkdir=False
it works as expected.
Notice that S3 automatically creates folders when writing files if they are missing.
Test Case
The following code reproduces the problem:
import fsspec
s3_path = 's3://my-bucket/path/to/folder/that/not/exists/yet/test.txt'
with fsspec.open(s3_path, mode='wb', compression='infer', **self.kwargs) as fs:
f.write('Some text.\n')
f.write('More text.\n') # data is flushed and file closed
This raises a exception like the following:
Traceback (most recent call last):
File "/venv/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 979, in _run_raw_task
result = task_copy.execute(context=context)
File "/venv/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
return_value = self.execute_callable()
File "/venv/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/examples/s3sfs_debug.py", line 146, in _test_fsjuarz_auto_mkdir_true
with fsspec.open(s3_path, mode='w', compression='infer', auto_mkdir=True) as fs:
File "/venv/lib/python3.7/site-packages/fsspec/core.py", line 399, in open
**kwargs
File "/venv/lib/python3.7/site-packages/fsspec/core.py", line 254, in open_files
[fs.makedirs(parent, exist_ok=True) for parent in parents]
File "/venv/lib/python3.7/site-packages/fsspec/core.py", line 254, in <listcomp>
[fs.makedirs(parent, exist_ok=True) for parent in parents]
File "/venv/lib/python3.7/site-packages/s3fs/core.py", line 460, in makedirs
self.mkdir(path, create_parents=True)
File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 100, in wrapper
return maybe_sync(func, self, *args, **kwargs)
File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 80, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 51, in sync
raise exc.with_traceback(tb)
File "/venv/lib/python3.7/site-packages/fsspec/asyn.py", line 35, in f
result[0] = await future
File "/venv/lib/python3.7/site-packages/s3fs/core.py", line 450, in _mkdir
raise translate_boto_error(e) from e
FileExistsError: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
This happens because of the parameter auto_mkdir=True
on this code:
def open_files(
urlpath,
mode="rb",
compression=None,
encoding="utf8",
errors=None,
name_function=None,
num=1,
protocol=None,
newline=None,
auto_mkdir=True,
**kwargs
):