go-fuse icon indicating copy to clipboard operation
go-fuse copied to clipboard

Expose the telldir/seekdir cookies in the FS api.

Open Jille opened this issue 2 years ago • 2 comments

Also use interfaces for ReadDir(Plus)EntryList rather than structs to allow for easier testing.

Exposing the cookies allows users to take full control which allows them to write POSIX correct implementations. Additionally, this allows to forward the cookies to an underlying implementation.

Jille avatar May 14 '22 14:05 Jille

Filed in Gerrit on request of Han-Wen.

@SnoozeThis https://review.gerrithub.io/c/hanwen/go-fuse/+/539827

Jille avatar Jun 14 '22 20:06 Jille

(https://snoozeth.is/vo9NsR5tKY0) I will wait until https://review.gerrithub.io/c/539827 is merged and then add a comment.

SnoozeThis avatar Jun 14 '22 20:06 SnoozeThis

An error occurred while snoozing: Change was abandoned

SnoozeThis avatar Apr 07 '23 18:04 SnoozeThis

A change along these lines would have been handy to have merged, as it is a necessity in case you want to expose a file system that already has its own POSIXly correct readdir() implementation under the hood. For example, if you were to write a userspace NFSv4 client that is exposed through FUSE, then you can already call into a stateless READDIR operation that also takes these cookies.

Dealing with go-fuse handing out its own sequential numbering scheme is only a nuisance in that case. The reason being that NFSv4 READDIR doesn't require that cookies are sequentially numbered, let alone be monotonically increasing.

EdSchouten avatar Apr 09 '23 18:04 EdSchouten

can you specify what you mean precisely by

POSIXly correct readdir() implementation

regarding:

Dealing with go-fuse handing out its own sequential numbering scheme is only a nuisance in that case.

should I interpret "only a nuisance" as "not a showstopper", or "a showstopper"?

I'm puzzled why you'd expose NFS through FUSE. There are NFS golang clients, and the kernel has one. In fact, I'm strongly considering adding an NFSv4 server to go-fuse, because Apple is making osxfuse go away.

hanwen avatar Apr 10 '23 09:04 hanwen

Exposing NFS through FUSE was merely an example. Let me concretely describe why I have a change along these lines as a local patch in https://github.com/buildbarn/bb-remote-execution.

We have a virtual file system for facilitating builds. It's a single implementation of files, directories, [...], but we can expose it through different means. For FUSE we have an implementation of fuse.RawFileSystem that translates FUSE operations to calls against VFS objects. Similarly, we have an NFSv4 server that does the same thing. For each of the methods exposed by our VFS API, we need to make sure that they work well for both these protocols. For example, files and directories are all capable of reporting NFSv4 style change IDs, but those end up being ignored by the FUSE backend.

In the case of READDIR this means that it needs to work in a fully stateless manner. NFSv4 doesn't have the equivalent of OPENDIR and CLOSEDIR. NFSv4 clients may send requests with arbitrary cookies obtained by previous calls, and those should continue to work regardless of how much time has passed. For immutable/static directories this is easy, as we can just number entries contiguously. For mutable directories this is harder, especially if you want to guarantee that entries are not under/over-reported when the containing directory is mutated in parallel. We solve this by letting the directory itself track cookies for each entry, and letting READDIR report entries by insertion order.

The approach above would work fine both for FUSE and NFSv4. Unfortunately, go-fuse wouldn't allow us to attach the cookies that we compute ourselves to the directory entries, as it already does this on its own. We could work around this by using an approach similar to fs and nodefs, where a full directory listing is computed as part of the initial READDIR, but this is highly undesirable. The reason being that it would cause noticeable discrepancies between the behaviour of our FUSE and NFSv4 backends. We don't want this, as it makes it harder for our users to switch back and forth.

FYI: These are the local changes we currently carry around: https://github.com/buildbarn/bb-remote-execution/tree/master/patches/com_github_hanwen_go_fuse_v2

EdSchouten avatar Apr 10 '23 18:04 EdSchouten

NFSv4 doesn't have the equivalent of OPENDIR and CLOSEDIR. NFSv4 clients may send requests with arbitrary cookies obtained by previous calls,

I can't remember where this was, but someone once said to me that NFS truly does borderline insane things.

This likely means a new API surface for Readdir/Opendir and friends. Before I can introduce that, I have to understand the full problem. This means I have to do two things:

  • study the NFS spec in more detail
  • understand how traditional filesystems use directory offsets (if a traditional file system is exported over NFS, the same restrictions would apply to the readdir offset, right?). Do all modern file systems implement essentially what you propose?

hanwen avatar Apr 11 '23 08:04 hanwen

study the NFS spec in more detail

In that case I would recommend looking at RFC 7530, which is the latest version of the NFSv4.0 spec. NFSv4.1 and v4.2 also exist, but in the context of Buildbarn I have only implemented NFSv4.0, as that's what macOS Implements.

https://datatracker.ietf.org/doc/html/rfc7530#section-16.24

Note that the spec does provide some facilities for reporting that client provided cookies are invalid (by returning NFS4ERR_NOT_SAME), requiring the caller to rewind and restart iterating. Unfortunately, POSIX readdir() provides no error number mapping for that. Therefore, no userspace applications exist that actually use this mechanism. In practice, NFSv4 servers will only return this error if there is absolutely no meaningful way previously obtained cookies can be reused.

understand how traditional filesystems use directory offsets (if a traditional file system is exported over NFS, the same restrictions would apply to the readdir offset, right?). Do all modern file systems implement essentially what you propose?

I am not aware of other file systems that use exactly the same logic (i.e., numbering directory entries sequentially when inserting, and reporting them by insertion order). I do know that the robustness of readdir() over NFS tends to vary. I just found this article on the LWN page, which talks about ext4 and XFS: https://lwn.net/Articles/544520/. XFS apparently does a better job than ext4.

EdSchouten avatar Apr 13 '23 13:04 EdSchouten

Ah! btrfs does the same thing as Buildbarn, it seems: https://github.com/torvalds/linux/blob/285063049a65251aada1c34664de692dd083aa03/fs/btrfs/inode.c#L5956-L5958

	 * New directory entries are assigned a strictly increasing
	 * offset.  This means that new entries created during readdir
	 * are *guaranteed* to be seen in the future by that readdir.

EdSchouten avatar Apr 13 '23 13:04 EdSchouten

I've opened https://github.com/hanwen/go-fuse/issues/460 to discuss this more in depth.

I am in favor of doing this, but it needs a v3 of the API, so I want to move carefully to avoid churn.

hanwen avatar Apr 14 '23 08:04 hanwen