go
go copied to clipboard
proposal: io/fs: add writable interfaces
Go 1.16 introduced the embed and io/fs packages. The current implementation is a minimal readable filesystem interface.
This is a wonderful first step towards standardizing filesystem operations and providing the community with a lot of flexibility in how we implement filesystems. I would like us to take the next step and define a writable file system interface.
(FYI: I was surprised to not find an open issue for this, but maybe I missed something. Feel free to close if that's the case!)
Problem
fs.FS can't be modified after it's defined.
func Write(fs fs.FS) {
// We can't write to FS.
fs
}
Optional interfaces could be defined in user-land:
func Write(fs fs.FS) {
// We can't rely on vfs.Writable being implemented across community packages.
writable, ok := fs.(vfs.Writable)
}
But it suffers from the same issues that the readable filesystem interface aimed to solve: standardizing the interface across the ecosystem.
Use Cases
I'll list of few use-cases that I've come across since Go 1.16, but I'm sure the community has many more:
- A virtual filesystem that you can write to over time. Useful for file bundlers and databases that work in-memory and flush to the operating system's filesystem at the end.
- Write files to cloud storages like Google Cloud Storage or Amazon S3.
We've already seen started to see this pop up in the community around io/fs to address the problem in user-land:
- https://github.com/psanford/memfs: defines
fs.MkdirAllandfs.WriteFile - https://github.com/vedranvuk/fsex: defines
file.Write,file.Seekandfile.Close.
A quick search on Github will yield more community libraries: https://github.com/search?q=%22io%2Ffs%22+language%3Ago. For many of these implementations, you can imagine a useful writable implementation.
Of course, there are many other file system libraries that came before io/fs that define writable interfaces like afero and billy.
Proposal
I don't feel qualified to define an interface, I know people have thought about this much harder than I have. What I would love to see from a community member's perspective is the following:
package fs
func WriteFile(fs FS, name string, data []byte, perm FileMode) error
func MkdirAll(fs FS, path string, perm FileMode) error
Nice to Have: Be able to define if a filesystem is readable, writeable or read-writable.
func Open(fs fs.FS) (*DB, error) // Readable
func Open(fs fs.WFS) (*DB, error) // Writable
func Open(fs fs.RWFS) (*DB, error) // Read-Writable
Thanks for your consideration!
I'm currently experimenting with writing a file format encoding/decoding/mutating package that is intended to work with files that aren't guaranteed to easily fit in memory.
I would like to implement it in terms of fs.FS, which would make it so the library doesn't have to care whether these files actually exist on the local filesystem, in memory, or stored somewhere else. In point of fact, this is intended to be an archival format that distributes the contents of a single archive over a configurable number of actual files, and these files might be distributed geographically across different regions for redundancy.
This codec package doesn't want to care about supporting all the different places that the underlying files could be stored. It just wants to take in an fs.FS and a list of paths.
Additionally, to make this package more testable, using fs.FS would make it trivial to write tests without having to actually read and write files on disk.
Unfortunately, since fs.FS is read-only, I'm sitting here thinking up complicated ways that I could support fs.FS and somehow still support actually writing and mutating files on disk.
I've got a similar issue with my p9 package. It attempts to implement a 9P client and server in a high-level way, similar to net/http, and while I'd like to rework the file abstraction to use fs.FS, it currently would result in odd things due in part to Open() returning an fs.File, which is then read-only by design. For now, what I'm leaning towards is just having a function that takes my package's filesystem type and returns an fs.FS that abstracts it away, but the read-only problem will still be there.
Maybe something like this could work?
type RWFS interface {
FS
WriteFS
}
type WriteFS interface {
// Create creates a new file with the given name.
Create(string) (WFile, error)
// Modify modifies an existing file with the given name.
Modify(string) (WFile, error)
}
type WFile interface {
Write([]byte) (int, error)
Close() error
// Maybe also some kind of WStat() method?
}
Then the returned types would only have to expose either reading or writing methods, and the interface would just handle it transparently.
Persionally, I think that it would be a lot better if there was some way to abstract away the specific requirement of an fs.File as the returned type so that either Open(string) (*os.File, error) or Open(string) (SomeCustomFileType, error), but that would require language changes and that seems like overkill. It could be partially done with generics, such as with type FS[F File] interface { ... }, but it has some odd potential complications, and it wouldn't be fully backwards compatible at this point.
To me, this looks like the minimum requirement:
package fs
type WFile interface {
Stat() (FileInfo, error)
Write(p []byte) (n int, err error)
Close() error
}
type WriteFS interface {
OpenFile(name string, flag int, perm FileMode) (WFile, error)
}
And another (ugh) for making dirs:
type MkDirFS interface {
MkDir(name string, perm FileMode) error
}
And some helper functions for convenience:
func Create(fsys WriteFS, name string) (WFile, error) {
// Use fsys.OpenFile ...
}
func WriteFile(fsys WriteFS, name string, data []byte, perm FileMode) error {
// Use fsys.OpenFile, Write, and Close ...
}
func MkDirAll(fsys MkDirFS, path string, perm FileMode) error {
// Use fsys.MkDir to do the work.
// Also requires either Stat or Open to check for parents.
// I'm not sure how to structure that either/or requirement.
}
I think that we should lean more heavily on io and os, rather than making top-level WFile types. In particular, I think we should basically just define some top-level functions that can fall back all the way to Open, and not have a WFile interface at all.
Summary
ErrUnsupportedfor when implementations are not available- Optional
FSmethods:WriteFile(name, data, perm) (error)OpenFile(name, perm) (File, error)Create(name) (File, error)
- Optional
Filemethods:Write(data) (int, error)Truncate(size) errorfor use whenFS.Createis being emulated- Only used if the file has nonzero size in
Create
- Only used if the file has nonzero size in
Chmod(FileMode) errorfor use whenFS.OpenFileorFS.WriteFileare being emulated- Only used if the mode does not match after
Open
- Only used if the mode does not match after
- Helpers
Create(fs, name): tryCreateFS, thenOpen+Stat+TruncateOpenFile(fs, name, perm): tryOpenFileFS, thenOpen+Stat+ChmodFileWriteFile(fs, name, data, perm): tryWriteFileFS, thenOpenFile()+writeContentsWrite(File, data) (int, error): callsWriteor returnsErrUnsupported
Detail
See sketch of an implementation here:
https://gist.github.com/kylelemons/21539a152e9af1dd79c3775ca94efb60#file-io_fs_write_sketch-go
This style of implementation appeals to me because:
- You can check for mutability of a file with a familiar type:
io.WriteCloser(or justio.Writer, butFilerequiresClose) Filemutability, metadata mutability, andFSmutability are all orthogonal- Mutable file trees can store immutable files
- Immutable file trees can store mutable files
- Not all files in a FS need to have the same mutability constraints
Subs could be mutable even though theFSis not
- This keeps the "primary" top-level interfaces the same:
FSandFile Truncateis not required ifCreateis never used on nonzero-length filesChmodis not required if the permissions are correct by default (e.g. by exposing aFixedModefrom your fs package)
I think the same patterns can be used to implement Mkdir and MkdirAll on a filesystem as well.
I think that also having a Remove method that can work with a fs.FS interface would be a great addition (similar to the current os.Remove function).
Just a small contribution as it feels to me in the scope of this proposal, and since no one mentionned file removal 🙂
@kylelemons just to be consistent with fs.ReadDirFile, the only File extension at the moment, and iofs draft/The File interface:
type ReadDirFile interface {
File
ReadDir(n int) ([]DirEntry, error)
}
type WriteFile interface {
File
Write(p []byte) (n int, err error)
}
There is a comment from Russ Cox about this exact change as well.
There's an interesting package https://pkg.go.dev/github.com/hack-pad/hackpadfs that's got some interfaces defined for a writable version of io/fs... The interfaces "feel" pretty much like they should, and are effectively the same as much as what's been discussed here already. I'm in no way associated with that project, just thought it was some interesting work that might inform how writable file interfaces fit into a future release.
Looks like this proposal has label Proposal but isn't a part Review Meeting https://github.com/golang/go/issues/33502
@rsc can you move it into https://golang.org/s/proposal-status#active column please?
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
I haven't thought as in depth as some of the other folks here, but I came up with a similar set of interfaces and shortcuts for an FS backed by both the local filesystem and backed by the dropbox API. Here's the API (obviously not all of this is so general purpose:)
type CreateFS interface {
fs.FS
Create(string) (FileWriter, error)
}
type FileWriter interface {
fs.File
Write([]byte) (int, error)
}
type RemoveFS interface {
fs.FS
Remove(string) error
}
type WatchFS interface {
fs.FS
Watch(context.Context, string) (chan []fsnotify.Event, error)
}
type WriteFileFS interface {
fs.FS
WriteFile(string, []byte, fs.FileMode) error
}
func OpenDirFS(d string) fs.FS
func Remove(fss fs.FS, path string) error
func Watch(ctx context.Context, fss fs.FS, dir string) (chan []fsnotify.Event, error)
func WriteFile(fss fs.FS, name string, contents []byte, mode fs.FileMode) (err error)
If we add a WriteFile, I suggest that it should deviate from io.WriteFile to fix what can otherwise be a subtle concurrency bug.
os.WriteFile truncates the existing file before writing, which causes otherwise-idempotent writes to race. I suggest that fs.WriteFile should instead truncate after writing, so that an idempotent file can be rewritten arbitrarily many times without corrupting its contents.
Hello everyone!
Recently I encountered a similar use case where I wanted to dependency inject a filesystem, and my consumer is required to create, modify or remove files on the fsys.
I came up with the following header interface that closely follows the os stdlib:
type FileSystem interface {
Stat(name string) (fs.FileInfo, error)
OpenFile(name string, flag int, perm fs.FileMode) (File, error)
Mkdir(name string, perm fs.FileMode) error
Remove(name string) error
}
type File interface {
fs.File
fs.ReadDirFile
io.Writer
io.Seeker
}
I made an interface testing suite to cover roughly the ~127 most common filesystem interactions from a behavioural point of view. Then I made two suppliers for this interface:
- filesystems.Local
- local file system with optional jail root path.
- it uses the
ospackage functions
- filesystems.Memory
- in-memory file system variant that could be used for testing purposes.
The same interface testing suite tests both.
You can use them as a drop-in replacement for places where you had to use os the package.
The filesystems package also supplies similar functions as the os such as Open, Create, ReadDir, and WalkDir.
I plan to use and maintain this until an easy to use replacement comes out in the std library. I just wanted to share this with you all, hoping it helps someone out.
Cheers!
I'd write like this:
type WritableFS interface {
fs.FS
OpenFile(name string, flag int, perm fs.FileMode) (WritableFile, error)
}
type WritableFile interface {
fs.File
io.Writer
}
func Create(fsys WritableFS, name string) (WritableFile, error)
func WriteFile(fsys WritableFS, name string, data []byte, perm fs.FileMode) error
...
On top of a +1 for having the need of a writable fs interface, I'd like to enter into the conversation that files are not inherently readable or seekable. for example, files may be opened with fs.ModeAppend, or os.Stdin.
That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.
Any writable file implementation could return an error for read/seek operations...but it'd be nice to express that they're not necessary
That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.
@chrisguiney there are already io.Writer, io.Seeker, and io.WriteSeeker interfaces that should meet this need. IMHO using those interfaces best expresses the ability to write and/or seek.
That is , when designing the interface, I suggest keeping the writer interface disjoint from the reader interface, and instead having a third interface to union them.
@chrisguiney there are already
io.Writer,io.Seeker, andio.WriteSeekerinterfaces that should meet this need. IMHO using those interfaces best expresses the ability to write and/or seek.
These interfaces are for writable "things" like files, but the requirement in this issue is for a file-system-level interface, not for a "file"-level interface, so this does not seem to answer the need, does it ?
Personally, I don't think there is a practical way to support a writable fs.FS. Or it's at least extremely non-trivial.
When it comes to reading files, OS-dependent semantic differences can mostly be papered over by handwavingly assuming the fs.FS is immutable. Once you actively encourage writing to the underlying filesystem, though, you open an uncontainable can of worms of semantic deviations between different operating systems and even filesystems on the same OS.
Questions which immediately arise are
- What happens if a file is opened for reading and for writing simultaneously? Windows (AFAIK) refuses this, returning an error, while most unix-like OSes will allow it. If we say it is forbidden, how do we forbid it under Linux? If we say it is allowed, how do we allow it under Windows? If we shrug and pass the buck, how are programmers supposed to deal with it?
- What kind of atomicity guarantees are given? When a power-loss (or the process is killed by a signal) happens during a
Write, what are the allowed states for the resulting file to be in? This is entirely unspecified, even within a single OS. It heavily depends on the FS and the settings its mounted with. - Similarly, what about
Flush? Again, in practice it is mostly unspecified what this actually does (in the presence of a crash) but it's most likely going to need to be supported. - How do renames behave if the target file exists, or source and target are the same file, or one is a symlink to/below the other?
- Speaking of symlinks, what about them? What about unix permission bits and what on systems lacking those? What about extended attributes? What about SELinux etc.?
- What about
flock? - How do you handle differences in allowed file names on different OSes?
- An interface like
WriteFilemight be reasonable to support, but even that is difficult to do cross-platform and even then, requires different APIs than just a simple "dump this please" to be reliable.
As far as I can tell, there just is no sound way to build an abstraction over filesystems that allows writing. The best advice I've heard so far on this topic is "if you care about data being written, use a database".
Of course, it is possible to just provide some API and tell the programmers that they can't actually rely on the data actually being written, so it shouldn't be used in any production use case. But that just feels irresponsible.
@Merovius isn’t the os package such an abstraction? Would you consider that irresponsible/not safe for prod?
Yeah, I’m not convinced by any of that @Merovius.
The write interface doesn’t need to support every imaginable operation to be useful, and of the operations it would support, Go is well known for taking a pragmatic approach, even when it leads to correctness issues in some circumstances, and this pattern is all over the file interfaces Go already provides. Windows doesn’t support Unix permission bits, so Go just… does whatever it feels like on Windows.
Maybe we shouldn’t use Go in production? I disagree.
A lot of your questions would be up to the implementation to decide. If the implementation is transparently delegating to a real OS filesystem, the answers would mirror what we currently see in the os package… it would be nothing shocking at all. If the implementation is more abstract, it can and will do whatever seems reasonable, and people will report bugs against that package’s repo if they disagree. That’s how all interfaces work.
@Merovius Perhaps the way to think about this is to concentrate on the cases that do not involve the operating system. After all, we already have a way to write operating system files. What may possibly be helpful is a way to view something like a zip file as a file system that we can write to. That would let code write out a tree of files as it sees fit, with the zip file system arranging for everything to flow out correctly at the end. And then the same code would work for tar.
If we take this approach, then I think the next step to ask is: do we need to support a read/write file system? Or should the file system be write only? Because I think a lot of the concerns go away with a write only file system.
So is there a use case for a read/write file system interface?
I think it's generally fine to gloss over the semantics of how a write will behave on any given os/fs/hardware. Programs already vary in behavior by using *os.File and related os functions.
The real power having a standardized interfaces provides is being able to easily swap out the usages with implementations that do make semantics well defined. The biggest use case I have: testing. I need to know how my program behaves if opening a file fails, or hangs indefinitely. Perhaps testing makes for a simplistic example, but those are more well defined semantics than what you'll get from the os package. If, I can test how my program behaves to any arbitrary condition or state the fs at runtime may possibly present, it's actually easier to make a system that behaves consistently across different platforms.
As far as I can tell, there just is no sound way to build an abstraction over filesystems that allows writing. The best advice I've heard so far on this topic is "if you care about data being written, use a database".
While I've also shared that advice, I'd like to point out two things:
- it's not such a helpful thing to tell someone writing the database itself.
- even databases get it wrong (because fsync is that undefined)
@hherman1 Yes, I believe the os package is such an abstraction and yes, it has many of the same problems, stemming from "writing files is subtle". See for example @bcmills comment here. Its saving grace, if any, is that it's a relatively thin abstraction. So it's possible to use it and stay relatively close to the OS (e.g. by using OpenFile in conjunction with x/sys/unix for flags). If we're trying to be as broad in applications as io/fs, the abstraction needs to be thicker, hide more of the differences and restrict itself further to the least common denominator.
So maybe that's surprising to people (and your question was rhetorical), but I do more and more think that using os to write files is questionable in production use. I think it's okay for some use cases, e.g. if you don't do it a lot or if it happens in an interactive program (so it's at least obvious and can be verified if something goes wrong). Otherwise, yes, I think people should not use os directly, but should use wrappers like renameio or an embedded DB.
@ianlancetaylor I believe people will invariably expect os.DirFS to support a read/write interface.
@chrisguiney
- it's not such a helpful thing to tell someone writing the database itself.
- even databases get it wrong (because fsync is that undefined)
For 1: Those people shouldn't use io/fs or even os, but likely use x/sys/unix especially because they have to be careful. And 2: Yes. I was alluding to that above. I think that makes it a worse idea to try and do it yourself, not a better.
In any case. I think it's probably possible to create a good abstraction. Just that it's subtle and it didn't seem that anyone has considered the subtlety involved.
@Merovius no my question wasn’t rhetorical, I meant it. Wasn’t sure how to make that clear here.
AFAICS point 1 is not correct: Windows supports opening a file for read and write simultaneously by passing GENERIC_READ | GENERIC_WRITE to CreateFileA.
@fgm The way I read those docs, that depends what is used for dwShareMode, also based on what other processes have opened those files. But in any case, that list was not meant as a specific list of questions to answer, but more as a list of examples of the kinds of questions that arise when you try to create a cross-platform API for writable files. I'm sure there are answers to more than one of them. I'm less optimistic that more won't crop up over time or that we won't discover new problems over time by using whatever API we box ourselves into by adding it to the stdlib.
But maybe I'm wrong. I just wanted to express my concerns. And make clear that the problem isn't as easy as adding a Write method to fs.File.
As mentioned by @hairyhenderson, this issue is less about how to have a writable fs.File and more on how to have an fs.FS equivalent that allows creating such files.
I think an approach similar to gocloud.dev/blob would be a good match here:
fs.FSis read-only and returnsfs.Filewhich is also read-only. File systems that allow reading should implement this.fs.WriteFSwould be write-only and return probably anio.WriteCloseror somefs.WritableFilethat aggregates multiple interfaces
If necessary, there could also be an fs.ReadWriteFS for file systems that do allow reading and writing simultaneously, returning an interface that aggregates both the return types above.
Example (I didn't give much though about the names):
type WritableFile interface {
io.WriteCloser
// Maybe other methods and interfaces
}
type WriteFS interface {
OpenWritable(name string) (WritableFile, error)
}
type ReadWriteFile interface {
WritableFile
File
}
type ReadWriteFS interface {
OpenReadWrite(name string) (ReadWriteFile, error)
}
Note that in this case a file system can implement fs.FS and fs.WriteFS while not implementing fs.ReadWriteFS if a file can only be opened in one of the modes at a time.
This also fits with the existing io interfaces (Reader, Writer, ReadWriter, etc.)
OK, I'll probably get flamed on this, but what's so wrong about something like the PHP stream wrapper feature, adapted to Go syntax, obviously ? https://www.php.net/manual/en/class.streamwrapper.php
@fgm I think that is basically what we are talking about here.
@Merovius
I believe people will invariably expect
os.DirFSto support a read/write interface.
Fair enough, but that is on the server end. I still wonder what clients would want a read/write interface. I think that thinking of some might help us better understand where the problems might be.
Fair enough, but that is on the server end. I still wonder what clients would want a read/write interface. I think that thinking of some might help us better understand where the problems might be.
I don't understand. By server/client are you referring to the implementor/consumer of the interface?