severe file corruption when using libfuse3 with sshfs (probably not only with sshfs?) - possible fix
Arch Linux SSHFS version 3.7.3 FUSE library version 3.16.2 using FUSE kernel interface version 7.38 fusermount3 version: 3.16.2
Steps to reproduce:
- mount directory on remote server with sshfs
- open ssh terminal to remote server
- in the remote terminal - go to the mounted directory, create a text file, put some text in it
- over sshfs - do an md5sum of the file
- in the remote terminal - edit the file, ADD SOME CHARACTERS so the lenght increases
- over sshfs - do second md5sum of the file
- over sshfs - do a third md5sum of the file
second and third md5sum of the file are different the file is not edited between them
there is also another bug - sometimes the second md5sum is THE SAME as first, even though the file is edited between taking md5sums. this happens even though "-o auto_cache,ac_attr_timeout=0" is given, and happens even with 0 timeout for detecting changes. happens less frequently, but if you do ~10 consecutive test (possibly with re-mounting between them) - this bug will also happen
I am not familiar with libfuse3 codebase, but is it possible the bug is due this fragment in fuse.c open_auto_cache function:
struct timespec now;
curr_time(&now);
if (diff_timespec(&now, &node->stat_updated) >
f->conf.ac_attr_timeout) {
struct stat stbuf;
int err;
pthread_mutex_unlock(&f->lock);
err = fuse_fs_getattr(f->fs, path, &stbuf, fi);
pthread_mutex_lock(&f->lock);
if (!err)
update_stat(node, &stbuf);
else
node->cache_valid = 0;
}
it seems this code gets updated file attributes from remote side, but does not invalidate the cache if the attributes and/or filesize have changed.
Is it possible this fragment needs to be changed to something like this:
static void open_auto_cache(struct fuse *f, fuse_ino_t ino, const char *path,
struct fuse_file_info *fi) {
struct node *node;
pthread_mutex_lock(&f->lock);
node = get_node(f, ino);
if (node->cache_valid) {
struct timespec now;
curr_time(&now);
if (diff_timespec(&now, &node->stat_updated) > f->conf.ac_attr_timeout) {
struct stat stbuf;
int err;
pthread_mutex_unlock(&f->lock);
err = fuse_fs_getattr(f->fs, path, &stbuf, fi);
pthread_mutex_lock(&f->lock);
if (!err) {
if (node->stat.st_size != stbuf.st_size ||
node->stat.st_mtime != stbuf.st_mtime ||
node->stat.st_mode != stbuf.st_mode ||
node->stat.st_uid != stbuf.st_uid ||
node->stat.st_gid != stbuf.st_gid) {
node->cache_valid = 0;
} else {
update_stat(node, &stbuf);
node->cache_valid = 1;
}
} else {
node->cache_valid = 0;
}
}
}
fi->keep_cache = node->cache_valid ? 1 : 0;
pthread_mutex_unlock(&f->lock);
}
I did further investigation and was able to triage the bug better.
Here is more easily reproducible and more informative test.
Lets increase dcache_stat_timeout, because it masks the bug. Corruption happens when copying changed files before dcache_stat_timeout expiration.
lets mount sshfs with the following command line: -о auto_cache,ac_attr_timeout=0 -o dcache_stat_timeout=60
Here is the test:
- make text file on the server, add some text in it (vim /sshfs-test.txt)
- use ssh to mount "/" on the server to a directory on local computer
- open ssh terminal to the server
- via sshfs mount: cat sshfs-test.txt (cat1)
- in the terminal - vim /sshfs-test.txt, add some text so the file size increases
- via sshfs mount: cat sshfs-test.txt (cat2)
- via sshfs mount: cat sshfs-test.txt (cat3)
The result is:
- part of new content of test file appears on cat2, BUT
- the full size of the new content appears on cat3
This means than first read of the file after the change returns corrupted content - file is cut to the old size, but has a new content. In my use case - this caused a lot of PDF files changed by a script and then copied over sshfs to be severely corrupted - unopenable.
There is also a second bug - related to the first.
If you repeat the test, but mount sshfs with a slightly different command line: -o kernel_cache,auto_cache,ac_attr_timeout=0 -o dcache_stat_timeout=60
Then the result is:
- on cat2 - old content of the file appears with old size
- on cat3 - new content of the file appears with new size
This seems related to cache handling and so the bug is probably in fuse3 and most probably due to improper cache and/or stat attributes cache expiration in auto_cache open.
node->cache_valid = 0 is set in open_auto_cache() -> update_stat() . You mean that checking for mtime is not sufficient?
I can try to reproduce your issue, but I don't promise that I will do very timely.
node->cache_valid = 0is set inopen_auto_cache() -> update_stat(). You mean that checking for mtime is not sufficient? I can try to reproduce your issue, but I don't promise that I will do very timely.
Yes, I see.
It seems new file content is updated in the cache on first read, but new (increased) file size is not yet updated - and thus first read returns corrupted file with new content cut to the old size
and then on second read new (increased) file size is delivered to the cache and file is OK
Thank you very much for your work in this project :)
So that basically renders it to be neither bug in libfuse, nor sshfs and not kernel either - everything works as designed. Just your user expectation does not match. What you actually need is atomic-open, which updates/sets kernel file attributes on file-open. You can search for my name and "atomic-open" - you will see that I'm actually working on it. In a scenario like yours, it is typical to set timeouts to 0 and use FOPEN_DIRECT_IO. But that triggers lots of FUSE_GETATTR - reducing this is my primary goal with atomic-open. Although, maybe better to add in a more simple version into fuse (kernel) first and then add the other more complicated pieces (like avoiding lookup-revalidate) later.
Anyway, I think we should close this issue here as there is just no bug. You will need to be patient to get atomic-open in kernel, libfuse (simple in low-level, I'm not sure yet about high-level (maybe the simplified version that doesn't suppress initial getattr is easier) and then needs to be added to sshfs. From what I checked last time (about 2 years ago), the sftp protocol doesn't support open-getattr, unfortunately - a pity, would reduce latencies.
And the workaround is to either set low getattr timeouts or to use direct-io (FOPEN_DIRECT_IO)
Thank you for taking the time to elaborate on this issue so thoroughly.
I am confused, though : setting the dcache_stat_timeout to 0 succesfully prevents the issue (i.e. when stat cache expires - fuse/sshfs fetches increased file size correctly).
Why, then, handling open with auto_cache=on and ac_attr_timeout=0 cannot do the same ?
Thanks again
My problem is that dcache_stat_timeout and ac_attr_timeout are not libfuse options - I would need to look through sshfs code to see what it is doing. Right now I don't manage to find the time for that...
What do you mean they are not libfuse options?
According to "man mount.fuse3" ac_attr_timeout (and autocache=on) are libfuse options
Looking at the sources - variables holding their values are defined in include/fuse.h and used in many places across libfuse sources.
dcache_stat_timeout maps to libfuse attr_timeout, they are the same. they set the value of attr_timeout defined in fuse.h
handling the cache expiration after attr_timeout works correctly
handling the cache expiration in first file open after ac_attr_timeout has a bug
I am really sorry, but - what am I getting wrong ? What do you mean by "not libfuse options" ?
I am really confused, can you please clarify ?
Ah sorry, ac_attr_timeout is indeed a libfuse option. So ac_attr_timeout is open_auto_cache specific and all it can do is to set/unset fi->keep_cache (FOPEN_KEEP_CACHE) - by no means it can update kernel file attributes. Well, we could send a notification ioctl, but would be hacky from my point of view.
dcache_stat_timeout maps to libfuse attr_timeout, they are the same. they set the value > of attr_timeout defined in fuse.h
So there we go - setting attr timeout to 0 requires kernel side to reload file attributes on every read beyond EOF
kernel side
static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct inode *inode = iocb->ki_filp->f_mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
/*
* In auto invalidate mode, always update attributes on read.
* Otherwise, only update if we attempt to read past EOF (to ensure
* i_size is up to date).
*/
if (fc->auto_inval_data ||
(iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
int err;
err = fuse_update_attributes(inode, iocb->ki_filp, STATX_SIZE);
if (err)
return err;
}
return generic_file_read_iter(iocb, to);
}
...
static int fuse_update_get_attr(struct inode *inode, struct file *file, struct kstat *stat, u32 request_mask, unsigned int flags) ... else sync = time_before64(fi->i_time, get_jiffies_64());
if (sync) {
forget_all_cached_acls(inode);
/* Try statx if BTIME is requested */
if (!fc->no_statx && (request_mask & ~STATX_BASIC_STATS)) {
err = fuse_do_statx(inode, file, stat);
if (err == -ENOSYS) {
fc->no_statx = 1;
err = 0;
goto retry;
}
} else {
err = fuse_do_getattr(inode, stat, file);
}
The timeout is stored in fi->i_time
Well, a new feature would be to add support for fuse_lowlevel_notify_* to high level and then to call something like fuse_highlevel_notify_expire_entry in open_auto_cache() when it set node->cache_valid = 1, but that would be just a workaround.
Yes, thank you, I now see what you mean.
So then - what would be the best solution in order for libfuse to be able to:
- have reasonable attr cache for the purpose of directory browsing
- and update attrbutes (and file size) before every file open ?
I have no problem setting attr_timeout to 0, but then there is no attr cache for the purpose of directory browsing
ls -la /usr/sbin over sshfs takes 8 seconds, and if the directory has many symlinks - 20 seconds
gvfs sftp somehow handles this and there is no delay, but if the directory is mounted with sshfs - browsing the directories with file managers becomes even slower - because on top of "ls -la" time - file managers are doing a lot of "magic byte" lookups (because of the bloated freedesktop.org mime types database in /usr/share/mime/packages/freedesktop.xml)
So, in currect state users are left with two choices:
- either accept 20+ seconds for opening directories containing more than 130 files, or
- set the attr_cache to value greater than zero - and deal with file corruption (or remembering to copy/open every file twice, because first copy gets corrupted if file is changed on server and file size increases)
In your opinion - what can be done to avoid this ?
Do you have any idea ?
Once again - I appreciate your time and effort working on this project.
As I wrote above, adding attr invalidation to high level would work and atomic-open will eventually do the right job. Adding readdir-plus with offset=0 to sshfs might be another option - entries would be read into a list (long time) and then kernel would fetch from that list (rather) fast. But I have not looked into details to be sure that that approach would work. For sure it would have other bad side effects (because kernel would get attributes that are possibly already outdated).