kotlinx-io
kotlinx-io copied to clipboard
Files and FileSystems support
The ability to work with files and filesystems is one of the crucial features that a programming language could provide through its standard library. Even if an application is not built around reading and writing data from storage devices, it still may need access to a filesystem to, for example, read configuration files or write logs.
A constant stream of GitHub issues and feedback on kotlinx-io suggests a demand for fully-featured multiplatform files and filesystems API.
The API exposed through kotlinx.io.files package was created to provide some partial file support quickly and is neither well-designed nor covers basic user needs. I'm proposing to review it, extend or redesign it if necessary.
For the purposes of this proposal, file and filesystem-related features could be split into two coarse categories: basic and extended. Basic features include the minimal necessary features required for working with files and filesystems, something one may expect from any FS-related API. Extended features include everything else.
Basic features
Below is the list of features to be considered as basic file and filesystem features.
Paths
- concatenation/joining;
- segmentation (get a parent path, get file name, file extension);
- resolution (convert relative path to absolute);
- relativization (make a path relative to a base path).
FileSystem features
- create/delete files;
- create/delete directories;
- list the content of a directory;
- files/directory renaming/moving (both atomic and non-atomic);
- files copying;
- directory tree traversal;
- file/directory metadata querying and update (atime/mtime/size/etc., touch);
- basic permissions support (query and update);
- symbolic links support (create, resolve);
- query and change the current working dir;
- temporary file/directory creation.
Files
- read (including reading from an arbitrary offset/position);
- write (including writing at an arbitrary offset/position);
- truncate/resize.
FileSystems
- default system filesystem;
- filesystem aimed for testing.
Most of the features listed above are presented in the vast majority of modern filesystems. However, implementation details for some of them (like file metadata or permissions) vary significantly not only between different operating systems but also within filesystems on the same OS. The way these features will be supported should be decided during the design phase.
Extended features
Extended features include mainly OS- or FS-specific features, niche functionality, or something that is hard to implement reliably. The list is non-exhaustive.
Paths
- an ability to manipulate with a path corresponding to a filesystem not represented in a current system (i.e. ability to explicitly process Windows paths on Unix host; see Dart paths package docs for an example).
FileSystem features
- special files support (pipes, fifo, locks, shmem, etc.);
- watching FS updates;
- extended permissions support;
- hard links support;
- globs support (find all paths matching a glob like */.txt);
- partitions/mount points/volumes support (query name, root dir, size, capacity, etc.).
FileSystems
- archive file systems: Zip, Tar, etc.
Files
- sendfile/splice support;
- memory mapped files support.
Nice to have features
Not fitting to either of two previous categories, but nice to have features:
- allow wrapping
java.nio.file.FileSystemon JVM to reuse existing third-party filesystem implementations, like S3-FS.
The plan
The overall plan is to review existing files and filesystem API, redesign it if needed, and then concentrate on supporting all the basic features. The list of basic features is the subject of change, further subdivision and prioritization.
There are no particular plans regarding features considered extended, but the proposed design should be flexible enough to allow their support in the future.
Do you have plans to support non-UTF-8 encoding systems for file paths? This is crucial for some non-English systems, such as the GBK encoding used in Windows. Currently, OKIO does not support this(Native part), which makes it very inconvenient to use.
@willflier that's a great question, thanks! Speaking of different encodings support in general, there were no plans to start working on it in the near future. However, I will definitely check what could be done with the fliename/path encoding on Windows.
Yet another option to keep in mind is a support a family openat-based filesystem operations (openat, linkat, renameat, mkdirat, etc).
Hi, as mentioned in https://github.com/Kotlin/kotlinx-io/issues/163#issuecomment-2002355007 I have also been working on a Kotlin file system API, and now I have a proposed API surface at https://github.com/zhanghai/MaterialFiles/tree/filesystem/app/src/main/java/me/zhanghai/kotlin/filesystem. Since this issue is also about a better and more powerful file system API, I hope my design and the rationale behind it might be helpful:
The proposed API surface is designed with my experience/reading in:
- Java NIO File API and implementing an Android file manager based on it, including a custom default
FileSystemProviderbased on Linux syscalls (due to missing Java 8 desugaring a few years ago) - SFTP/SMB/WebDAV/DocumentsProvider/libarchive protocols/APIs and implementing
FileSystemProviders for them. - Okio APIs
- Windows FS APIs (
fileapi.h,ntifs.h) - NodeJs/libuv FS APIs
- Web FileSystemHandle APIs
It appears similar to the Java NIO File API, but with a number of (opinionated) choices:
- Async:
- All I/O operations are
suspendfunctions, since Web APIs are async-only (except for the Web Worker only sync variant which they somewhat regreted https://github.com/w3ctag/design-reviews/issues/772#issuecomment-1557687555) and blocking API isn't ideal with NodeJS either (https://github.com/square/okio/pull/841). So if we need to choose one I believe we should go with async. - The performance concerns I've read about (e.g. https://github.com/square/okio/issues/814) are mostly about the state machine (a big
switch) and the creation of continuation objects. Those may be a reasonable thing to worry about for handling network requests at high QPS, but seems to me won't be too significant compared to time spent in disk I/O.
- All I/O operations are
Path:- Path is an independent data class like in Okio, instead of one type per file system provider like in Java.
- Path still has the scheme and URI concepts like in Java, to be extensible and allow different types of paths.
- Path is based on byte strings instead of strings, so that we can correctly represent and work with non-unicode paths.
- Path does not internally store the name separator (it holds a list of name segments), and it is up to the file system to convert it to an actual underlying representation (and potentially caching that). (This may also be changed if we find a performance issue since it's an impl detail.)
- Path has a root URI and the URI representation of any path is created by replacing the path component of the root URI with the name segments of that path.
- Path is simplified conceptually to always have a single root path because paths need to be convertible to URIs and (absolute) URIs must have absolute paths anyway.
- Path intentionally doesn't support platform-dependent relative paths like "C:foo".
FileSystemProvider- File system provider is now merely a provider for creating new
FileSysteminstances and isn't responsible for file operations. Different implementations can have their own way to allow different options when creating file systems, e.g. how to retrieve credentials given a particular SFTP path. FileSystemRegistrynow stores all the file systems.
- File system provider is now merely a provider for creating new
FileSystem:- All path-based file system operations happens on
FileSysteminstead ofFileSystemProvider. - File systems are identified by their root URI.
- There is only one root directory and it is always the path created from the root URI.
- All file system operations are provided as extension functions on
Pathas well.
- All path-based file system operations happens on
FileMetadata(View)- This class replaces
BasicFileAttributes(View). - There is no longer a concept of attribute view names and the file system always returns one file metadata instance for a file. It may implement a more specific interface like
PosixFileMetadatato offer more information specific to a platform. FileMetadataViewisCloseableso that it may hold on to a certain file descriptor. This is more efficient for remtoe file systems like SFTP and SMB, and also allows a potential API to use an existingFileHandle/FileDescriptorto open aFileMetadataView.
- This class replaces
FileContent:- This class replaces
FileChannelandSeekableByteChannelas well as OkioFileHandle. It is not namedFileHandlebecause its meant to represent only the content and not a generic file descriptor on POSIX or file handle on Windows. - It doesn't have a concept of
position, and always allows random access, similar to Okio. This simplifies locking and aligns much better with remote file systems like SMB/SFTP etc. - It provides
openSource()andopenSink()ultities to help callers who want sequential access.
- This class replaces
DirectoryStream:- This class returns instances of
DirectoryEntryobjects uponread(). DirectoryEntrycontainsnameinstead ofPathinstances so that the class may work with an existingFileHandle/FileDescriptorin the future (possible withfdopendir/NtQueryDirectoryFile). Helpers likeFileSystem.readDirectory(): List<Path>can simplify cases where only the paths are needed.- Additional options like
READ_METADATAcan be passed intoFileSystem.openDirectoryStream()so thatDirectoryEntrywill contain a non-nullmetadatafield. This is designed for remote file system protocols like SFTP and SMB where metadata can be queried when listing a directory for better performance.
- This class returns instances of
Things like kdoc, watch, walking directory tree, cross-provider copy/move, progress listener option during copy/move, a 100% compliant multi-platform URI parser implementation that supports byte strings, are not there yet but shouldn't affect the design in a significant way either.
I made some updates to the code above and finally got it published as a separate personal project https://github.com/zhanghai/filesystem-kt . You can check out its API reference here, e.g. Path.
Notable changes include:
-
Relative paths are now represented by a separate
RelativePathclass to reduce confusion, e.g. inresolveandrelativizeAPIs when mix matching absolute and relative paths.Paths are now always absolute andFileSystemonly works with these absolutePaths. To be fair, not many file systems support current working directories except for the local file system, and JVM didn't have an actual runtime modifiable CWD anyway, plus a mutable CWD doesn't work well with multi-threading, so I believe requiringPaths for file system operations is reasoanble.RelativePaths can still be used for creating/manipulating paths. -
FileSystemProvideris removed and people should useFileSystemRegistrydirectly.This gives back developer the freedom for when/how file systems are created and destroyed, since automatically creating file system instances right before a file operation may not be ideal/possible.
-
A JVM implementation with
PlatformFileSystemis added, while a non-JVM variant without aPlatformFileSystemis also available. -
KDoc was added for some of the classes (notably
Path), while the rest of the docs is still WIP. -
Watching file changes, walking directory tree and cross-provider copy/move are still TBD.
I should note again that this isn't any official Google effort despite that I'm an employee, but just a personal project from my experience working on zhanghai/MaterialFiles and released in the hope it may help with the API design for this.