gcsfs
gcsfs copied to clipboard
Pseudo folders break directory detection code in fs.info
Various tools/libraries create pseudo folders in GS, for example Tensorflow or using the "Create folder" function in the Google Storage Browser UI.
The code that identifies directories does not handle this properly: https://github.com/remerge/gcsfs/blob/be50a9a8d23a04529c40540a1cc123ce5b3512fc/gcsfs/core.py#L819-L820
The folder name will be an exact match and assumed to be a file on the first invocation. This breaks various other methods like isdir.
How to reproduce:
import gcsfs
import tensorflow as tf
path = "YOUR-BUCKET/foo"
tf.io.gfile.mkdir(f"gs://{path}")
fs = gcsfs.GCSFileSystem()
print(fs.isdir(path)) # => False
print(fs.isdir(path)) # => True
I'll create a PR tomorrow to fix this.
I would argue that path is indeed a file because ... it is a file, albeit empty. Directories are only ever implied by their contents, since their names are returned as "common prefixes", i.e., a different mechanism (and the names will not contain "/" at the end).
What were to happen if of the folder placeholders were to contain data?
Yes that is what I meant by pseudo folder. The issue here is that what info returns is not consistent due to trying to support the idea of folders. In the example isdir should be idempotent but it does returns false on the first invocation and true on the second.
The pseudo folders are usually indicated by a zero size objects ending with /. Detecting this would be my suggested fix: https://github.com/dask/gcsfs/pull/313
In the example isdir should be idempotent but it does returns false on the first invocation and true on the second.
Totally agree that this is wrong, and there should be a test that ensures that repeated calls get the same result.
I also agree consistency is the important part.
Just wanted to check if there is any news on this issue?
This is likely fixed in the current release. Are you still experiencing a failing case, @bilelomrani1 ?
I apologize for the confusion caused by my previous comment. After further investigation, I realized that the issue I encountered was not related to what I initially thought. Instead, I discovered that the problem stemmed from the fact that fs.makedirs behaves as a no-op with gcsfs, which is a documented behavior.
no worries :)