zos Flistd: failed to list current possible mounts

Flistd: failed to list current possible mounts

Open DylanVerstraete opened this issue 5 years ago • 5 comments

Happens to more than 1 node on mainnet

2020-10-26 17:36:43 | [+] flistd: 2020-10-26T16:36:43Z error failed to cleanup stall mounts error="failed to list current possible mounts: failed to list g8ufs process pids: /var/cache/modules/flistd/pid: open /var/cache/modules/flistd/run: no such file or directory"
2020-10-26 17:36:43 | [+] flistd: 2020-10-26T16:36:43Z error failed to cleanup stall mounts error="failed to list current possible mounts: failed to list g8ufs process pids: /var/cache/modules/flistd/pid: open /var/cache/modules/flistd/run: no such file or directory"

Oct 26 '20 16:10 DylanVerstraete

Related to nodes without disks. We should investigate wether it's actually possible to boot a node without disks. If so, we can try and prevent that because we need a disk to store the cache on.

Oct 27 '20 10:10 DylanVerstraete

When storaged fails to find a disk usable for cache. It does starts in degrated mode. We need storaged to always be able to somehow starts because it is a dependency of networkd and we need network to be able to report the disk failure to the farmer.

Now maybe what we should do is return an error for all the zbus method expose by storaged so we do not really allow anything to be created by storaged while in degraded state?

Oct 27 '20 10:10 zaibon

as far as we remember we set a system wide flag once storaged is booted in degraded mode, where other daemons able to check. The point is some daemons (like the networking) will still be able to boot also in degraded mode.

Other daemons should also block and stay not accessible until the issue is fixed

Oct 27 '20 11:10 muhamadazmy

@muhamadazmy what do you suggest as a fix?

Oct 27 '20 15:10 DylanVerstraete

We might have to introduce similar code like this:

https://github.com/threefoldtech/zos/blob/master/cmds/provisiond/main.go#L60

In all modules that depends on availability of storage, or are misbehaving if the cache does not exist

Oct 28 '20 13:10 muhamadazmy

zos zos copied to clipboard

Flistd: failed to list current possible mounts

zos
zos copied to clipboard