DeepSea
DeepSea copied to clipboard
Missing sync on manual upgrade
Description of Issue/Question
We have multiple cases of alternate upgrade paths. The end result is confusion when the step after upgrading the DeepSea rpm fails with a message such as "foo.bar not found". Telling admins to read the documentation seems insufficient.
I am suggesting adding the following to the postinstall
TARGET=$(awk '/^deepsea_minions/ {print $2}' /srv/pillar/ceph/deepsea_minions.sls)
salt $TARGET saltutil.sync_all 2>/dev/null || :
If the salt command fails for any reason, then the admin is in no worse a predicament. However, many would have an experience that matches expectations. (i.e. upgraded software is installed where required. Salt is a distributed system and zypper is not.)
I assume you mean the rpm post-install script? What would happen during the "zypper up" in case one or multiple minions are not reachable?
Then the admin would be in the same predicament that they currently have. The current situation is
rpm -ivh deepsea or zypper in deepsea
Run Stage 0 or maintenance.upgrade (but the admin has some reason not to do this)
OR
Run salt '*' saltutil.sync_all
If minions are down, then you will get an error. If you choose to ignore that error, start the downed minions and then try to execute software that is not available, then getting th error "foo.bar not found" will happen.
If the rpm postinstall at least tried and that Salt clusters are generally healthy (i.e. all minions are present), then the behavior matches the expectation. (i.e. A new DeepSea is installed and the software is available in a Salt cluster.) If the minions are down when the package is installed, it changes nothing from the above predicament.
In either case, an admin getting the error message "foo.bar not found" would occur for more legitimate or obvious reasons (e.g. "Oh, I didn't know minionX was down"). The resolution is still the same. Run the sync command to that minion.
The difference is that with the salt / deepsea command this is currently expected.
With installing RPMs it not expected (in case the post-script is used and no proper timeout handling is implemented).
So having anything in rpms pre- or post-scripts that relies on other servers in the infrastructure might not be the best way to go - also keep in mind that there are down server upgrades where even services like the salt-master are not running. (i.e. SLES 12 -> SLES 15)
With the number of complaints that I have directly received, I dispute that users expect it.
I wouldn't want an rpm install to pause from a downed minion, but backgrounding the sync might be suitable.
Could we add some sort of a minion schedule so that all minions refresh their local cached modules automatically on a regular basis? I see this as some common problem in salt - so is there any salt best practice on updating modules on minions when ever a module gets modified on the master? (-> ensure that the module is the latest version before any further state executed is using a call in the module)
trying to tackle with https://github.com/SUSE/DeepSea/pull/1195