Issues icon indicating copy to clipboard operation
Issues copied to clipboard

Tentacle upgrade should block machines instead of tasks

Open tothegills opened this issue 7 years ago • 12 comments

Currently Tentacle upgrade will block tasks queued behind it from running so they don't attempt to run on a restarting Tentacle. It is somewhat selective but clumsy. If there is a problem with Tentacle upgrade (for example updating Calamari gets stuck), the Tentacle upgrade task can block other tasks indefinitely.

We have a script isolation mutex but it does not help Tentacle upgrade because each step of the upgrade takes out the mutex. Another task can acquire the mutex while Tentacle upgrade is between steps and run on a restarting Tentacle.

I think this issue can be resolved by wrapping the entire Tentacle upgrade process in the script isolation mutex. Each script step will need to be run without acquiring the mutex, which I believe is different to anything we do currently. With this approach, if a machine is blocking the Tentacle upgrade from progressing only tasks that involved that particular machine will be blocked rather than the entire task queue.

tothegills avatar May 31 '17 02:05 tothegills

I ran into this issue today. We had a Tentacle upgrade hang and blocked deployments to our auto scaling infrastructure. Hopefully it was a one time occurrence, but ideally it would never be an issue.

brianfeucht avatar Jan 17 '18 00:01 brianfeucht

Another report: https://help.octopus.com/t/upgrade-all-tentacles-prevents-any-deployment-task-to-be-executed/19766

hnrkndrssn avatar Apr 23 '18 06:04 hnrkndrssn

Another report (private link): https://secure.helpscout.net/conversation/605611169/28330?folderId=557077

hnrkndrssn avatar Jun 22 '18 01:06 hnrkndrssn

Another report (private link): https://secure.helpscout.net/conversation/760047726/38240?folderId=2271904

DevOpsDerek avatar Feb 06 '19 06:02 DevOpsDerek

Hi, we raised this on Thursday, June 21, 2018 9:44 AM after upgrading to 2018.6.5

We are now on LTS (2018.10.0) and this is still occurring.

On any version upgrade that requires a tentacle upgrade our deployment lead time becomes exponentially pushed due to us needing to plan the upgrade out of hours (we have a large infrastructure so upgrading 10K tentacles in one server task is going to slow down our pipeline somewhat if all other tasks are queued behind the upgrade)

DaveNorton avatar Feb 14 '19 13:02 DaveNorton

Another Report of this issue. https://octopus.zendesk.com/agent/tickets/67509

danefalvo avatar Apr 13 '21 03:04 danefalvo

Note that tentacle upgrades are not required to deploy. We support deploying to Tentacle 3.0.

I suggest working around this problem by setting an "outage" time to upgrade the tentacles and kicking it off then. Perhaps do it in batches. I realize this may be annoying with large installs, so we will keep this issue open.

droyad avatar Apr 13 '21 06:04 droyad

another report (private link): https://octopus.zendesk.com/agent/tickets/84752

paraicoceallaigh avatar Mar 04 '22 00:03 paraicoceallaigh

Another report (Internal ticket) - https://octopus.zendesk.com/agent/tickets/113079

Clare-Octopus avatar Mar 24 '23 09:03 Clare-Octopus