DeepSea icon indicating copy to clipboard operation
DeepSea copied to clipboard

Missing sync on manual upgrade

Open swiftgist opened this issue 6 years ago • 6 comments

Description of Issue/Question

We have multiple cases of alternate upgrade paths. The end result is confusion when the step after upgrading the DeepSea rpm fails with a message such as "foo.bar not found". Telling admins to read the documentation seems insufficient.

I am suggesting adding the following to the postinstall

TARGET=$(awk '/^deepsea_minions/ {print $2}' /srv/pillar/ceph/deepsea_minions.sls)
salt $TARGET saltutil.sync_all 2>/dev/null || :

If the salt command fails for any reason, then the admin is in no worse a predicament. However, many would have an experience that matches expectations. (i.e. upgraded software is installed where required. Salt is a distributed system and zypper is not.)

swiftgist avatar Mar 09 '18 12:03 swiftgist

I assume you mean the rpm post-install script? What would happen during the "zypper up" in case one or multiple minions are not reachable?

Martin-Weiss avatar Mar 09 '18 13:03 Martin-Weiss

Then the admin would be in the same predicament that they currently have. The current situation is

rpm -ivh deepsea or zypper in deepsea

Run Stage 0 or maintenance.upgrade (but the admin has some reason not to do this)

OR

Run salt '*' saltutil.sync_all

If minions are down, then you will get an error. If you choose to ignore that error, start the downed minions and then try to execute software that is not available, then getting th error "foo.bar not found" will happen.

If the rpm postinstall at least tried and that Salt clusters are generally healthy (i.e. all minions are present), then the behavior matches the expectation. (i.e. A new DeepSea is installed and the software is available in a Salt cluster.) If the minions are down when the package is installed, it changes nothing from the above predicament.

In either case, an admin getting the error message "foo.bar not found" would occur for more legitimate or obvious reasons (e.g. "Oh, I didn't know minionX was down"). The resolution is still the same. Run the sync command to that minion.

swiftgist avatar Mar 09 '18 14:03 swiftgist

The difference is that with the salt / deepsea command this is currently expected.

With installing RPMs it not expected (in case the post-script is used and no proper timeout handling is implemented).

So having anything in rpms pre- or post-scripts that relies on other servers in the infrastructure might not be the best way to go - also keep in mind that there are down server upgrades where even services like the salt-master are not running. (i.e. SLES 12 -> SLES 15)

Martin-Weiss avatar Mar 09 '18 14:03 Martin-Weiss

With the number of complaints that I have directly received, I dispute that users expect it.

I wouldn't want an rpm install to pause from a downed minion, but backgrounding the sync might be suitable.

swiftgist avatar Mar 22 '18 10:03 swiftgist

Could we add some sort of a minion schedule so that all minions refresh their local cached modules automatically on a regular basis? I see this as some common problem in salt - so is there any salt best practice on updating modules on minions when ever a module gets modified on the master? (-> ensure that the module is the latest version before any further state executed is using a call in the module)

Martin-Weiss avatar Mar 22 '18 10:03 Martin-Weiss

trying to tackle with https://github.com/SUSE/DeepSea/pull/1195

jschmid1 avatar Aug 29 '18 10:08 jschmid1