zos icon indicating copy to clipboard operation
zos copied to clipboard

Flistd: failed to list current possible mounts

Open DylanVerstraete opened this issue 5 years ago • 5 comments

Happens to more than 1 node on mainnet

2020-10-26 17:36:43 | [+] flistd: 2020-10-26T16:36:43Z error failed to cleanup stall mounts error="failed to list current possible mounts: failed to list g8ufs process pids: /var/cache/modules/flistd/pid: open /var/cache/modules/flistd/run: no such file or directory"
2020-10-26 17:36:43 | [+] flistd: 2020-10-26T16:36:43Z error failed to cleanup stall mounts error="failed to list current possible mounts: failed to list g8ufs process pids: /var/cache/modules/flistd/pid: open /var/cache/modules/flistd/run: no such file or directory"

DylanVerstraete avatar Oct 26 '20 16:10 DylanVerstraete

Related to nodes without disks. We should investigate wether it's actually possible to boot a node without disks. If so, we can try and prevent that because we need a disk to store the cache on.

DylanVerstraete avatar Oct 27 '20 10:10 DylanVerstraete

When storaged fails to find a disk usable for cache. It does starts in degrated mode. We need storaged to always be able to somehow starts because it is a dependency of networkd and we need network to be able to report the disk failure to the farmer.

Now maybe what we should do is return an error for all the zbus method expose by storaged so we do not really allow anything to be created by storaged while in degraded state?

zaibon avatar Oct 27 '20 10:10 zaibon

as far as we remember we set a system wide flag once storaged is booted in degraded mode, where other daemons able to check. The point is some daemons (like the networking) will still be able to boot also in degraded mode.

Other daemons should also block and stay not accessible until the issue is fixed

muhamadazmy avatar Oct 27 '20 11:10 muhamadazmy

@muhamadazmy what do you suggest as a fix?

DylanVerstraete avatar Oct 27 '20 15:10 DylanVerstraete

We might have to introduce similar code like this:

https://github.com/threefoldtech/zos/blob/master/cmds/provisiond/main.go#L60

In all modules that depends on availability of storage, or are misbehaving if the cache does not exist

muhamadazmy avatar Oct 28 '20 13:10 muhamadazmy