filesystem_spec
filesystem_spec copied to clipboard
FSSpec's get_mapper and other functions can produce nameless files for S3 Filesystems
Observed Behavior:
When fsspec
's S3FS's get_mapper
function is called on an empty directory-like path in an S3 filesystem, it can produce a dictionary-like object that is non-empty, and includes the key ''
(i.e. the empty string), and can produce files that are in fact empty directories in its listing:
It appears similar behavior can happen with other functions like ls
or walk
.
Expected Behavior
-
get_mapper
should not produce empty string filenames in directories, as users have to filter these out (they appear not to actually exist, and may correspond to versioning artifacts?). -
get_mapper
should not produce keys that correspond to empty directories.
Both of these behaviors can currently lead to unusual interactions when using get_mapper
. For example, when iterating over an FSMap and copying files from the remote system, trying to copy the ''
file will produce an error.
Possible Fix
get_mapper
for S3FS could include a filter that avoids adding leaf paths that are not files, tested using the isfile
predicate.
It appears similar behavior can happen with other functions like ls or walk.
Can you please show this? Since these are lower-level methods, it should be easier to fix given such examples.
FYI: list of dict of a mapper should be calling fs.find(...) https://github.com/fsspec/filesystem_spec/blob/master/fsspec/mapping.py#L178
@martindurant Here is what we are getting with ls
, walk
, and find
So you do indeed have a file of that name, it seems. This was likely created by the console when you clicked "create directory" (s3 doesn't actually support directories, it uses zero-length files to pretend, but only in some contexts).
This is exactly what happened. We managed to reproduce this by creating a folder, and the file with empty name magically happens, to fake a folder existing even though s3 has no knowledge of that.
The question is, should we ignore files with empty names and files ending with a /
?
Maybe for the special case of the mapper we should indeed ignore paths ending in "/", since that it the special character we use to construct paths from keys; but find/walk/ls should definitely show these files that happen to have names (almost) the same as directories.