restic fails on repo mounted via CIFS/samba on Linux using go 1.14 build
This issue is a summary of https://forum.restic.net/t/prune-fails-on-cifs-repo-using-go-1-14-build/2579 , intended as a reference for the underlying problem.
tl;dr Workaround: Setting the environment variable GODEBUG to asyncpreemptoff=1 restores the pre Go 1.14 behavior and fixes the problem.
Output of restic version
restic 0.9.6 (v0.9.6-137-gc542a509) compiled with go1.14 on linux/amd64
Linux Kernel version: 5.5.9
How did you run restic exactly?
restic prune -r /path/to/repository/on/CIFS/share
Relevant log excerpt:
Load(<data/216ea5f2d2>, 591, 4564169) returned error, retrying after 720.254544ms: open /mnt/nas/redacted/reponame/data/21/216ea5f2d21b458a7b913609ddef2a6ac4788b4bad5481b2916558d2ce1bef04: interrupted system call
Prune failed in the end
Further relevant log excerpts:
Load(<data/2e9db0642e>, 591, 4758136) returned error, retrying after 552.330144ms: read /mnt/nas/redacted/reponame/data/2e/2e9db0642e0fb67b959aa1d91c0d70daa8331ad246c5eeb8582ba2a14f24680f: interrupted system call
List(data) returned error, retrying after 282.818509ms: lstat /mnt/nas/redacted/reponame/data/64: interrupted system call
List(data) returned error, retrying after 492.389441ms: readdirent: interrupted system call
Save(<data/f0f5102554>) returned error, retrying after 552.330144ms: chmod /mnt/nas/redacted/reponame/data/f0/f0f51025542c0287943ef3816e642586be46ae10dc9efbcfa7b305d9e093dbd4: interrupted system call
What backend/server/service did you use to store the repository?
Local backend stored on a CIFS share
Expected behavior
No warnings, prune should complete.
Actual behavior
Prune failed.
Steps to reproduce the behavior
Build restic using Go 1.14 and store the backup repository on a CIFS share.
Do you have any idea what may have caused this?
This issue is a side effect of asynchronous preemptions in go 1.14. The release notes state the following:
This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again.
Go configures signal handlers to restart syscalls if possible. The standard library also retries syscalls when necessary. That is there should only be issues when directly calling low-level syscalls and in that case one should just implement things properly. However, restic just uses go standard library functions that should already handle EINTR if necessary.
The first prune error message points to an os.Open call (via fs.Open) in the Load function of the local backend. So it looks like a Go standard library call fails. However, the manpage for signal (man 7 signal) states that the open syscall, that is called underneath, is always restarted when using SA_RESTART as is done by Go. So this seems to be a bug in the Linux kernel. Adding a loop around the call to fs.Open to repeat it as long as EINTR is returned, fixes that one call. Fixing all problematic calls would end up adding lots of ugly loops and playing whack-a-mole.
The manpages of lstat, readdir and chmod don't even list EINTR as a possible errno.
Do you have an idea how to solve the issue?
Setting the environment variable GODEBUG to asyncpreemptoff=1 restores the pre Go 1.14 behavior and fixes the problem.
Go relies on the assumption that the kernel properly restarts syscalls when told to do so. As that latter is obviously not the case, the proper fix would be to submit a bug report to the linux kernel.
A short-term solution would be to add a note to the restic documentation that mentions the compatibility problem with CIFS mounts.
Hi, I can confirm that, I was unable to make a 400GB backup on a Samba share. It's reproducible (I tried three times) When I do a check it finds errors if I do a rebuild-index after the check generates error pages and repository become crashed. I thought it was my Samba configuration or the destination disk so I copied large amounts of data without any problems. And now I use the rest-server and it works. So there is a problem around Samba. Note I tried with and without export GODEBUG=asyncpreemptoff=1 without it crashes after a second. I also don't see any errors in journalctl server and client.
In that configuration the data and restic is on local disk and restic destination repository on Samba share disk. Something like "samba write"
go version go1.15.2 linux/amd64 restic 0.10.0 compiled with go1.15.2 on linux/amd64
I did another backup from a music server (bluesound) where the data are remote and restic and repo are local Something like "Samba read". I got (without GODEBUG=asyncpreemptoff=1 ) few warning but the repo check was ok. With GODEBUG=asyncpreemptoff=1 I got no problems.
restic 0.10.0 compiled with go1.15.2 on linux/arm64 go version go1.15.2 linux/arm64
Reported by @Mikescher in #2968
Summary
After upgrading to restic 0.10.0 I get read errors when backing up a mounted samba share. After downgrading to 0.9.6 the errors are once again gone.
Output of
restic versionrestic 0.10.0 compiled with go1.15.2 on linux/amd64 (before) restic 0.9.6 compiled with go1.13.4 on linux/amd64 (after downgrade)How did you run restic exactly?
I have a local restic repository and a mounted samba/cifs share (from my NAS) that I want to backup. I simply run
restic -r {local_repo} backup /mnt/nas_hektor --cleanup-cache --exclude-file {exclusions}.txtOutput:
repository ac08473e opened successfully, password is correct error: NodeFromFileInfo: Listxattr: xattr.list /mnt/nas_hektor/{...} : interrupted system call error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: lstat /mnt/nas_hektor/{...}: interrupted system call error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: NodeFromFileInfo: Listxattr: xattr.list /mnt/nas_hektor/{...} : interrupted system call error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: NodeFromFileInfo: Listxattr: xattr.list /mnt/nas_hektor/{...} : interrupted system call error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: NodeFromFileInfo: Listxattr: xattr.list /mnt/nas_hektor/{...} : interrupted system call error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: Readdirnames /mnt/nas_hektor/{...} failed: readdirent: no such file or directory error: NodeFromFileInfo: Listxattr: xattr.list /mnt/nas_hektor/{...} : interrupted system call error: NodeFromFileInfo: Listxattr: xattr.list /mnt/nas_hektor/{...} : interrupted system call Files: 834 new, 0 changed, 17209 unmodified Dirs: 46 new, 1191 changed, 0 unmodified Added to the repo: 251.044 GiB processed 18043 files, 8.624 TiB in 3:40:12 snapshot db1a5445 saved Warning: failed to read all source data during backupThe
/var/log/syslogis sprinkled many of the following (seemingly relevant) errors:Oct 2 01:42:04 omv-{name} kernel: [21401.666794] CIFS VFS: Send error in read = -4 Oct 2 01:05:45 omv-{name} kernel: [19223.167367] CIFS VFS: \\{nas}.fritz.box\Hektor Close interrupted closeWhat backend/server/service did you use to store the repository?
The repository is simply stored on the local filesystem, only the actual to-be-backed-up data is mounted via cifs
Expected behavior
Well, no errors..
Actual behavior
restic displays errors while running, ends with an error message and actually skips these files in the snapshot. Calling restic again backs the files up without problem (I can also without problem stat/list the files/directories in my terminal). It seems to just fail at random and after 3 invocations of restic backup I had all files backed up (after that I downgraded for the other backups I wanted to do)
Steps to reproduce the behavior
Not really sure if it's reproducible for everyone who backs up (a lot of) data from a cifs mountpoint. I tried to eliminate other problem sources (see below) but I don't know how much it depends on my setup/network/whatever
Do you have any idea what may have caused this?
No idea. I can say that I tried a lot of other things before downgrading restic because it really sounds like a samba problem. I did a full software update on my machine, restarted my synology nas, multiple times restarted my machine, tried tweaking samba mount options etc. But because a simple restic downgrade eliminated all problems it seems like restic at least started triggering some cifs bug since 0.10.0
Btw both machines are in the same local network, connected by a Gbit switch and stand only a meter from each other, so teh network should be very stable.
Do you have an idea how to solve the issue?
Nope, sorry.
Did restic help you today? Did it make you happy in any way?
Well not happy but I'm still convinced that you simply managed to trigger some stupid bug in mount.cifs and because I know how annoying such stuff is I feel kinda sorry to report it 😨
Analysis by @greatroar in #2968:
Oct 2 01:42:04 omv-{name} kernel: [21401.666794] CIFS VFS: Send error in read = -4That's an EINTR.
This is might be the new async preemption in the Go 1.14 runtime. The best way to tell would be to compile restic 0.10.0 with Go 1.13.4 and try again. But before you go installing compilers, maybe try setting ~GOMAXPROCS=1~ GODEBUG=asyncpreemptoff=1:
$ export GODEBUG=asyncpreemptoff=1 $ restic backup /mnt/nas_hektor --cleanup-cache --exclude-file exclusions.txt
Related: golang/go#39237, golang/go#40870 (especially this comment).
I think we ~should~ could put an EINTR loop around xattr, or maybe get the maintainers of pkg/xattr to do that. As for the Readdirnames failures, the Go people have decided that's a kernel bug that they're not going to work around, unless someone shows that it isn't.
The consensus among maintainers (expressed at #3061) is to wait until Go 1.16, which will include a thorough fix in the stdlib. If that's right, may I suggest adding a summary of/link to the workaround at the top and pinning this issue? It comes up regularly and will continue to do so until ca. February.
I'll pin this issue for now
Sorry if this isn't related to the same underlying Go issues, but I'm seeing an error when I try to back up under Windows 10 20H2 with restic 0.11:
repository 94fe90ee opened successfully, password is correct
creating VSS snapshot for [c:\]
error: failed to create snapshot for [c:\]: VSS error: async operation pending
Restic continues working and backs up all files that it can access, but obviously those in use are inaccessible because it doesn't believe it has a working shadow copy. However, the shadow copy was created by the OS and is available while restic runs.
@JPPhoto Please open a separate issue and fill out the issue form. This issue is exclusively about a Linux specific issue with Samba.
using GODEBUG and related nouser_xattr from https://github.com/restic/restic/issues/1800 helped me fix the problems in my script (https://forum.restic.net/t/error-lstat-errno-524-during-backup/3272)
The GODEBUG workaround helped me fix my backup. I use restic to bacup my NAS to object storage and this problem got me from making backups for months.
export GODEBUG=asyncpreemptoff=1
Go 1.16 has been released in the meantime. Could anyone try whether this finally solves the compatibility problems with CIFS on Linux?
Unfortunately, the problem is not solved. In fact it seems (if anything) to have been replaced by something worse.
Using
restic 0.12.0 (v0.12.0-11-g814a399e4) compiled with go1.16 on linux/amd64
and with a clean copy of a repo similar to the one in my original forum post, I first ran restic forget --prune --max-unused=0. This appeared to complete normally and there were no errors logged. Paranoid person that I am, however, I then ran restic check on the pruned repo. This produced a series of errors ending with
error: error loading index 55d2df1b: load <index/55d2df1b4a>: invalid data returned
error: error loading index 594fd4a0: load <index/594fd4a025>: invalid data returned
Fatal: LoadIndex returned errors
So I tried restic rebuild-index (which in the old days of the Go 1.14 problems had effectively repaired the repo). This produced multiple errors of the form
reindexing pack file ced69db1d699800b56f73678b8974c8487f114ae02f220c08561ecc9cf8cb38f with unexpected size 0 instead of 4250647
reindexing pack file 73c818e87931931b1d4721a99e57148c54560ff44d420956e1da9f04be73bf3d with unexpected size 4250647 instead of 420237
A second attempt to rebuild the index resulted in multiple lines of the form
adding pack file to index ced69db1d699800b56f73678b8974c8487f114ae02f220c08561ecc9cf8cb38f
adding pack file to index 3c350cd1f0d6b938793bdd57a4180264f8b00e642783c10a633e630e05dd240a
adding pack file to index d34953733e994019a21c73ad699cc7950af2c91f9d5d4a05ef726f63b658aa44
There was no explicit indication that either rebuild-index had failed but a restic check produced several hundred error messages ending with
tree d13ef09a: file "b98af7035914f872_0" blob e7b7edb0ef1394d41bc61c9e485e77192ff326500556c36ae0613ff30c9976bf not found in index
tree d13ef09a: file "e1e489cb2437551c_0" blob 16630545bb3558e2229696fa5a98c17f8ffe9599e317dde28674481bc3739a2c not found in index
[0:07] 100.00% 37 / 37 snapshots
Fatal: repository contains errors
All of the above was with GODEBUG unset. However, at no time did I see any of the "interrupted system call" messages previously characteristic of the issue.
At this point I gave up. I thought at first that (since I had to reestablish my CIFS configuration to do this test) I might have fouled up access permissions or something similar. So (again starting with a clean copy of the repo, obviously) I repeated the above sequence of events using
restic 0.9.6 compiled with go1.14.4 on linux/amd64
and GODEBUG=asyncpreemptoff=1 as before. Worked perfectly (no errors).
Finally, just to take my Go 1.16 restic build out of the equation, I reran the original sequence of commands on a clean copy of the repo with the filesystem mounted over NFS (instead of CIFS). Again, everything worked perfectly.
Possibly the original problem is indeed fixed but the fix introduced another issue?
Unfortunately, the problem is not solved. In fact it seems (if anything) to have been replaced by something worse.
That seems indeed to be the case. I've encountered a similar (?) issue with Go 1.16.1 on Ubuntu 20.04 running via WSL (Linux-Kernel 4.19.128) with a local samba share (a plain Linux system/VM should work just as well). For testing I've created a new repository and backed up a few gigabytes of data. For me the issues so far were slow operations (at least I had that impression) and the silent corruption of a pack file! The pack file ended up with a few 64kb ranges which just contain null bytes instead of the actual data.
So this looks like either a Go or a kernel problem. So far I haven't had time assemble a more compact test case for reproducing.
So far I haven't had time assemble a more compact test case for reproducing.
I have a small test repo (a few text files, a few snapshots, <1.5Mb) that I created for a different purpose. I would be glad to do some testing if that would be helpful. Alternatively, possibly I could provide a copy of the repo to you...it doesn't contain any sensitive information.
I have a small test repo (a few text files, a few snapshots, <1.5Mb) that I created for a different purpose. I would be glad to do some testing if that would be helpful.
Thanks for the offer, however, I had a different type of test case in mind: I wanted to simplify the way to create a damaged repository / files. So that ideally should just involve running a small go program and not restic as a whole. Such a small test case would then allow digging into where exactly things go wrong.
having just run this over an SMB share it worked fine for me (though a small 6GB backup). Is this still an issue and if so, anything I can do to help testing?
For me everything works fine on a default Debian 10 installation with:
restic 0.12.0 compiled with go1.16.3 on linux/amd64
I backuped my data to a fresh repo on a smb share and ran the following commands without any issues:
restic -r /mnt/backup/repo forget --keep-last 1 --prune
restic -r /mnt/backup/repo check
I even did a testrestore without any issues.
Im running restic on a mac but the backup is mounted via SMB share from NAS running Linux. Does this issue apply in my situation?
@csss1234 No. This issue only applies when the host running restic uses Linux.
@underhillian Which Kernel version are you using?
Did anyone else notice problems with restic + CIFS + go 1.16?
We could add a check to prevent using a repository stored on a CIFS mount on Linux. That would basically enforce the remark that currently exists in the documentation. The big downside of this workaround is that we'd have no clue at all whether that option will still be necessary in the future or not.
diff --git a/internal/backend/local/local.go b/internal/backend/local/local.go
index 0410e51b..d58d4880 100644
--- a/internal/backend/local/local.go
+++ b/internal/backend/local/local.go
@@ -5,6 +5,8 @@ import (
"io"
"os"
"path/filepath"
+ "runtime"
+ "strings"
"syscall"
"github.com/restic/restic/internal/errors"
@@ -31,6 +33,24 @@ const defaultLayout = "default"
// Open opens the local backend as specified by config.
func Open(ctx context.Context, cfg Config) (*Local, error) {
debug.Log("open local backend at %v (layout %q)", cfg.Path, cfg.Layout)
+
+ if runtime.GOOS == "linux" {
+ dbg, _ := os.LookupEnv("GODEBUG")
+ hasAsyncPreempt := !strings.Contains(dbg, "asyncpreemptoff=1")
+ if hasAsyncPreempt {
+ var stat syscall.Statfs_t
+ err := syscall.Statfs(cfg.Path, &stat)
+ if err != nil {
+ return nil, err
+ }
+ const CIFS_MAGIC_NUMBER = 0xff534d42
+ if stat.Type == CIFS_MAGIC_NUMBER {
+ return nil, errors.Fatal("Storing a repository on CIFS requires disabling" +
+ "asynchronous preemption by setting the environment variable GODEBUG to 'asyncpreemptoff=1'.")
+ }
+ }
+ }
+
l, err := backend.ParseLayout(ctx, &backend.LocalFilesystem{}, cfg.Layout, defaultLayout, cfg.Path)
if err != nil {
return nil, err
Which Kernel version are you using?
I use Arch Linux (rolling updates) and update regularly so my kernel is almost never more than a couple of weeks behind linux-stable.
As best as I can tell from my trusty restic backups:wink:, at the time of my last test as reported above (Mar 07), I was running 5.11.2 (although it might have been a minor release or two later than that...can't be more precise at this point).
I just upgraded a server at home from ubuntu LTS from 18.04 to 20.04 and now I can't run backups anymore on files that are on my NAS from my cifs mounted path.
I also upgraded restic before trying so I don't know which upgrade caused the problems.
This is all over the restic backup output now:
I don't have the repositoryu stored on the cifs mounted volume, I am only trying to create a backup of files that are mounted on a cifs file system.
input/output errorc ...redacted... error: NodeFromFileInfo: Listxattr: xattr.list
If I run stat on one of the failed files I don't get any errors.
$ stat "...redacted..."
File: ...redacted...
Size: 5493072 Blocks: 10736 IO Block: 1048576 regular file
Device: 35h/53d Inode: 399240 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 1000/ thomasf) Gid: ( 0/ root)
Access: 2021-06-04 15:31:45.395788000 +0000
Modify: 2008-01-27 19:17:46.000000000 +0000
Change: 2008-01-27 19:17:46.000000000 +0000
Birth: -
My kernel is
Linux gems 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
I also upgraded restic before trying so I don't know which upgrade caused the problems.
Please specify restic version. Lots of Linux distros use outdated restic versions, your best course of action to have the best restic version is to simply use the official restic binaries published here on GitHub.
0.12.1 via the self-update subcommand
I also upgraded restic before trying so I don't know which upgrade caused the problems.
Please specify
restic version. Lots of Linux distros use outdated restic versions, your best course of action to have the best restic version is to simply use the official restic binaries published here on GitHub.
Hmm, when I write a test specifically targeting the same file that fails with i/o error inside resic it does not cause an i/o error. I might have missed something.
restic debug log:
2021/08/08 08:56:38 archiver/archiver.go:353 archiver.(*Archiver).Save 56 /media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw41.jpg target "/media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw41.jpg", previous <nil>
2021/08/08 08:56:38 archiver/archiver.go:384 archiver.(*Archiver).Save 56 /media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw41.jpg regular file
2021/08/08 08:56:38 restic/node.go:622 restic.(*Node).fillExtendedAttributes 61 fillExtendedAttributes(/media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw40.jpg) [] Listxattr: xattr.list /media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw40.jpg : input/output error
2021/08/08 08:56:38 archiver/file_saver.go:145 archiver.(*FileSaver).saveFile 60 /media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw41.jpg
2021/08/08 08:56:38 archiver/archiver.go:492 archiver.(*Archiver).Save 56 return after 0.003
simple test:
package restic
import (
"os"
"syscall"
"testing"
"github.com/davecgh/go-spew/spew"
"github.com/restic/restic/internal/restic"
)
const (
fn = "/media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw41.jpg"
)
func TestNodeFromFileInfo(t *testing.T) {
fi, err := os.Stat(fn)
if err != nil {
t.Fatal(err)
}
stat, ok := toStatT(fi.Sys())
t.Log(ok, stat)
if !ok {
t.Fail()
}
node, err := restic.NodeFromFileInfo(fn, fi)
if err != nil {
t.Fatal(err)
}
t.Log(spew.Sdump(node))
}
type statT syscall.Stat_t
func toStatT(i interface{}) (*statT, bool) {
s, ok := i.(*syscall.Stat_t)
if ok && s != nil {
return (*statT)(s), true
}
return nil, false
}
test output:
$ ssh gems ./foo.test -test.v
=== RUN TestNodeFromFileInfo
random_test.go:24: true &{49 18317912 1 33261 1000 0 0 0 34898 1048576 72 {1622819926 309775400} {1283468376 0} {1283468376 0} [0 0 0]}
random_test.go:33: (*restic.Node)(0xc0000d4640)(-rwxr-xr-x 1000 0 34898 2010-09-02 22:59:36 +0000 UTC jw41.jpg)
--- PASS: TestNodeFromFileInfo (0.02s)
PASS
Is always the same file affected or does the error affect different files each time? The testcase probably only works in the former case.
Another major difference to the situation in restic, is that the overall load is completely different. restic usually keeps the system quite busy, whereas the testcase is a single command without putting any noteworthy stress on the system. As we know that the CIFS problems are triggered by the async. preemption feature of go, your testcase would have to run the problematic syscall at the time when the preemption signal arrives. For that the testcase would at least have to repeat the operation over and over again (probably on different files).
Please test whether setting GODEBUG=asyncpreemptoff=1 solves the problem for you.
GODEBUG=asyncpreemptoff=1 does not seem to help.
It looks like it's different files every run.
It is a pretty slow home server (Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz) so it is probably more vulnerable to timing than the average modern machine.
I also tried setting all the archiver concurrencies in the code to 1 and adding a few time.Sleep(200ms) here and there around the areas where the errors appear. With the sleep it seems like error rate maybe went down a bit but far from disappeard.
I will try to set up a smaller test repo where I don't have tens of thousands of files for every run and see if I can narrow anything down.
I have tried a bunch of different cifs mount settings now without any luck.
An flag that enabled you yo just skip extended attributes would also work (at least for me). It is only the listxattr syscall that fails and I don't think I use any of that for my cifs mounts.
I just did this to see what would happen and now I can at least run my backups without many thousands of errors
func (node *Node) fillExtendedAttributes(path string) error {
return nil
So this is probably not the same fault that this issue is really about but it is somewhat related. The syscalls does not as far as I can understand time out at all, they actually fail with io error which is what strace also says.
Update:
After changing to not doing the listxattr syscall I could run my ~60k files ~3TB backups from the cifs mount without issues with or without GODEBUG=asyncpreemptoff=1 without any unexpected errors.
Did you try the nouser_xattr mount option?
It is a pretty slow home server (Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz) so it is probably more vulnerable to timing than the average modern machine.
The symptoms sound like a serious bug in either the CIFS server or client. Or maybe it is also a network problem. Although the CPU of the server isn't the fastest, it's also not the slowest one either.
Did you try the
nouser_xattrmount option?
I missed that one and it also seems to work , have been running through a few tens of thousands files already and should definitely have failed by now.
It is a pretty slow home server (Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz) so it is probably more vulnerable to timing than the average modern machine.
The symptoms sound like a serious bug in either the CIFS server or client. Or maybe it is also a network problem. Although the CPU of the server isn't the fastest, it's also not the slowest one either.
I ran a few network load tests and even upgraded the switch firmware so it's probably not that. It's probably some interaction between the samba/kernel version of my qnap NAS and the CIFS client in the latest Ubuntu LTS kernel.
In any case nouser_xattr seems to work so It's all good for now.
After many months of using nouser_xattr this problem still occurs from time to time. Same dataset, roughly same changes every day - most days no errors, some days there are. I still cannot see a pattern why this is happening :-(