filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

FSSpec's get_mapper and other functions can produce nameless files for S3 Filesystems

Open jaedoucette opened this issue 11 months ago • 6 comments

Observed Behavior:

When fsspec's S3FS's get_mapper function is called on an empty directory-like path in an S3 filesystem, it can produce a dictionary-like object that is non-empty, and includes the key '' (i.e. the empty string), and can produce files that are in fact empty directories in its listing:

image

It appears similar behavior can happen with other functions like ls or walk.

Expected Behavior

  1. get_mapper should not produce empty string filenames in directories, as users have to filter these out (they appear not to actually exist, and may correspond to versioning artifacts?).
  2. get_mapper should not produce keys that correspond to empty directories.

Both of these behaviors can currently lead to unusual interactions when using get_mapper. For example, when iterating over an FSMap and copying files from the remote system, trying to copy the '' file will produce an error.

Possible Fix

get_mapper for S3FS could include a filter that avoids adding leaf paths that are not files, tested using the isfile predicate.

jaedoucette avatar Mar 21 '24 15:03 jaedoucette

It appears similar behavior can happen with other functions like ls or walk.

Can you please show this? Since these are lower-level methods, it should be easier to fix given such examples.

martindurant avatar Mar 21 '24 15:03 martindurant

FYI: list of dict of a mapper should be calling fs.find(...) https://github.com/fsspec/filesystem_spec/blob/master/fsspec/mapping.py#L178

martindurant avatar Mar 21 '24 15:03 martindurant

@martindurant Here is what we are getting with ls, walk, and find

image

tdopierre avatar Mar 21 '24 16:03 tdopierre

So you do indeed have a file of that name, it seems. This was likely created by the console when you clicked "create directory" (s3 doesn't actually support directories, it uses zero-length files to pretend, but only in some contexts).

martindurant avatar Mar 21 '24 16:03 martindurant

This is exactly what happened. We managed to reproduce this by creating a folder, and the file with empty name magically happens, to fake a folder existing even though s3 has no knowledge of that.

The question is, should we ignore files with empty names and files ending with a /?

tdopierre avatar Mar 21 '24 17:03 tdopierre

Maybe for the special case of the mapper we should indeed ignore paths ending in "/", since that it the special character we use to construct paths from keys; but find/walk/ls should definitely show these files that happen to have names (almost) the same as directories.

martindurant avatar Mar 21 '24 17:03 martindurant