s3fs
s3fs copied to clipboard
Be less strict in non-strict mode
Motivation:
S3FS backend do not work well if bucket contains file structure without proper directory markers, even in non-strict mode. This patch skips few directory checks for strict=False
mode (for default, strict mode old behaviour should be preserved) and makes my integration tests happy.
It does the same as #51 but in few more places.
It should fix #55, #52 and #57 in non-strict mode
@willmcgugan could you take a look, please?
Also, because we don't care about directory markers in non-strict mode I think about replacing makedir
/ makedirs
implementations with 'do nothing' implementation. This way we can write code working for many filesystem protocols, but without creating not necessary directory markers on S3 (for example we can precede creation of file with makedirs(path, recreate=True) - it will create required directories on sftp / file / ftp filesystems and do nothing on s3 in non-strict mode). What do you think about that?
@mrk-its I don't see the benefit in that, apart from avoiding the extra work of creating directories. Unless there is some major bottleneck there, I would prefer if directories created in non-strict mode where still there when opened in strict mode.
@willmcgugan On my production S3 buckets I simply do not have these directory markers at all (Instead I see a lot of empty files with suffix _$folder$
- other way of marking directories, by Apache Hadoop) and I don't see any benefits having another placeholder files, especially in non-strict mode. But I probably can live with them (simply ignore my latest comment).
What about changes in this PR?
I also have this issue, I am working with a shared bucket where creating extra meta information objects in the bucket would be an unfortunate complication, and one certainly not to be followed by other folks accessing it via the CLI tools.
@willmcgugan I understand that you don't see the benefit in that, apart from avoiding the extra work of creating directories.
However, you don't always control the S3 bucket that you connect to. Meaning you only have read access to that bucket.
What is the down side to get one of this PR's in #60 or #51 ?
@nivm I have a backlog of PRs and issues to look through, but fundamentally the problem is satisfying everyone's use case. It may not even be possible, given how S3 isn't quite a real filesystem.