filesystem_spec
filesystem_spec copied to clipboard
CachedFileSystem is special in URL chaining
I'm trying to implement a file system that always replaces the 'a' open mode with 'w'. The purpose is to support TensorBoard on filesystems (local mounts of remote network FS, which is LocalFileSystem in fsspec) that don't support the append writing mode. The new TruncateFileSystem works like CachedFileSystem, but it modifies the open parameter while forwarding all other calls to the underlying file system.
However, when I tried to use url_to_fs with my file system, I noticed the following behavior:
I believe this behavior occurs due to the special handling in the following code:
https://github.com/fsspec/filesystem_spec/blob/90c7cd9e6c939fc37341fd793831a399753ebfd9/fsspec/core.py#L361-L362
This makes it difficult for third-party implementations to achieve a similar URL chaining behavior as the cached file system.
Here’s my idea: There are two types of URL chaining. One type uses a file from the underlying file system (e.g., zip), while the other type uses the entire file system from another implementation (e.g., cached and m truncate). These two styles and their implementations should be clearly documented (currently, the URL chaining documentation lacks a description of this behavior for the cached file system). Furthermore, third-party implementations should be able to achieve the same behavior without needing to use CachedFileSystem as a base class.
One possible solution is to create a new marker class, ChainedFileSystem, that inherits from AbstractFileSystem but introduces no new implementation. Then, make CachedFileSystem inherit from ChainedFileSystem.
This issue is similar to #1722 -- there should be a better way of handling the special cases like http/https and filecache.
I agree, a class hierarchy (maybe with mixin) or class attribute it a better way to define this kind of behaviour. I would happily consider a PR.
issubclass(cls, CachingFileSystem)
This does already do some of this work, but you are right that "file based" and "filesystem based" chainable filesystems are conceptually different. The special thing about the caching filesystems, is that they only handle access to file bytes and pass all other calls on to the next filesystem.
I require this feature, I have given it a pass in #1929