gcsfs icon indicating copy to clipboard operation
gcsfs copied to clipboard

Pseudo folders break directory detection code in fs.info

Open dynamix opened this issue 4 years ago • 4 comments
trafficstars

Various tools/libraries create pseudo folders in GS, for example Tensorflow or using the "Create folder" function in the Google Storage Browser UI.

The code that identifies directories does not handle this properly: https://github.com/remerge/gcsfs/blob/be50a9a8d23a04529c40540a1cc123ce5b3512fc/gcsfs/core.py#L819-L820

The folder name will be an exact match and assumed to be a file on the first invocation. This breaks various other methods like isdir.

How to reproduce:

import gcsfs
import tensorflow as tf

path = "YOUR-BUCKET/foo"

tf.io.gfile.mkdir(f"gs://{path}")

fs = gcsfs.GCSFileSystem()
print(fs.isdir(path)) # => False
print(fs.isdir(path)) # => True

I'll create a PR tomorrow to fix this.

dynamix avatar Nov 27 '20 00:11 dynamix

I would argue that path is indeed a file because ... it is a file, albeit empty. Directories are only ever implied by their contents, since their names are returned as "common prefixes", i.e., a different mechanism (and the names will not contain "/" at the end). What were to happen if of the folder placeholders were to contain data?

martindurant avatar Nov 27 '20 20:11 martindurant

Yes that is what I meant by pseudo folder. The issue here is that what info returns is not consistent due to trying to support the idea of folders. In the example isdir should be idempotent but it does returns false on the first invocation and true on the second.

The pseudo folders are usually indicated by a zero size objects ending with /. Detecting this would be my suggested fix: https://github.com/dask/gcsfs/pull/313

dynamix avatar Nov 30 '20 19:11 dynamix

In the example isdir should be idempotent but it does returns false on the first invocation and true on the second.

Totally agree that this is wrong, and there should be a test that ensures that repeated calls get the same result.

martindurant avatar Nov 30 '20 19:11 martindurant

I also agree consistency is the important part.

isidentical avatar Mar 01 '21 08:03 isidentical

Just wanted to check if there is any news on this issue?

bilelomrani1 avatar Jul 10 '23 20:07 bilelomrani1

This is likely fixed in the current release. Are you still experiencing a failing case, @bilelomrani1 ?

martindurant avatar Jul 11 '23 13:07 martindurant

I apologize for the confusion caused by my previous comment. After further investigation, I realized that the issue I encountered was not related to what I initially thought. Instead, I discovered that the problem stemmed from the fact that fs.makedirs behaves as a no-op with gcsfs, which is a documented behavior.

bilelomrani1 avatar Jul 15 '23 02:07 bilelomrani1

no worries :)

martindurant avatar Jul 15 '23 02:07 martindurant