zos
zos copied to clipboard
Flistd: failed to list current possible mounts
Happens to more than 1 node on mainnet
2020-10-26 17:36:43 | [+] flistd: 2020-10-26T16:36:43Z error failed to cleanup stall mounts error="failed to list current possible mounts: failed to list g8ufs process pids: /var/cache/modules/flistd/pid: open /var/cache/modules/flistd/run: no such file or directory"
2020-10-26 17:36:43 | [+] flistd: 2020-10-26T16:36:43Z error failed to cleanup stall mounts error="failed to list current possible mounts: failed to list g8ufs process pids: /var/cache/modules/flistd/pid: open /var/cache/modules/flistd/run: no such file or directory"
Related to nodes without disks. We should investigate wether it's actually possible to boot a node without disks. If so, we can try and prevent that because we need a disk to store the cache on.
When storaged fails to find a disk usable for cache. It does starts in degrated mode. We need storaged to always be able to somehow starts because it is a dependency of networkd and we need network to be able to report the disk failure to the farmer.
Now maybe what we should do is return an error for all the zbus method expose by storaged so we do not really allow anything to be created by storaged while in degraded state?
as far as we remember we set a system wide flag once storaged is booted in degraded mode, where other daemons able to check. The point is some daemons (like the networking) will still be able to boot also in degraded mode.
Other daemons should also block and stay not accessible until the issue is fixed
@muhamadazmy what do you suggest as a fix?
We might have to introduce similar code like this:
https://github.com/threefoldtech/zos/blob/master/cmds/provisiond/main.go#L60
In all modules that depends on availability of storage, or are misbehaving if the cache does not exist