cf-ops-automation icon indicating copy to clipboard operation
cf-ops-automation copied to clipboard

Prevent bosh deployment updates concurrently with their bosh director

Open gberche-orange opened this issue 6 years ago • 1 comments

Is your feature request related to a problem? Please describe.

Changes to a deployer (say bosh director stemcell upgrade) that may trigger downtime may cause failure of bosh deployments managed by this deployer. Updates to a deployer should not run concurrently with its managed deployments.

This may potentially apply to future K8S deployments

Currently, release authors automating such updates to deployers have either to:

  • split in distinct releases, deployed in sequence by operators
  • instruct operators to pause deployment pipelines

Describe the solution you'd like

  • prevent deployment updates concurrently with update to a deployer (e.g. when a bosh director "bosh-master" is updated, then its deployments should not be updated such as "cf")
    • potentially leverage lock resources such as
      • https://github.com/cloudfoundry-community/locker-resource (shared webservice with file based persistence, in the persistence disk)
      • https://github.com/concourse/pool-resource git based persistence

Describe alternatives you've considered

Use of concourse serial-groups seems limited to a single pipeline, whereas we need to serialize jobs across pipelines, so the following wouldn't be of much help:

  • A given deployment (e.g. "master-depls/bosh-ops/deployment-dependencies.yml" (see sample deployment-dependencies.yml) would be able to configure a tag (e.g. "ops-depls") that would be assigned by coa to concourse serial-groups
  • Root deployments would by default have all their jobs run with a tag matching their name (e.g. "ops-depls") assigned as a concourse serial-groups
    • A given root deployment can be overriden to use a distinct (e.g. by setting a distinct tag into "master-depls/concourse-ops/pipelines/credentials-ops-depls-pipeline.yml" )

gberche-orange avatar Aug 07 '18 09:08 gberche-orange

might be managed with bosh-level ha / zerodowtime update

poblin-orange avatar Jun 05 '19 09:06 poblin-orange