kustomize icon indicating copy to clipboard operation
kustomize copied to clipboard

Race Condition Issue: Assign Order of Execution to Certain Components.

Open acarlstein opened this issue 1 year ago • 6 comments

Summary

There are occations where the order of deployment of components (of kind such as Job, CloudFunctionsFunction, etc.) matters. Find a way to indicate in the kustomize.yaml file in which order certain components should be deployed.

Description

Lets assume you're trying to deploy a Cloud Function using the CRD CloudFunctionsFunction of apiVersion cloudfunctions.cnrm.cloud.google.com/v1beta1.

This component requires that either:

  • You provide an url to a repository where the code reside,
  • or you to provide an url of a storage bucket where the code resides inside a ZIP file.

Regredably, the repository where the code resides isn't accessable by CloudFunctionsFunctions; therefore, you can only follow the "Zip file" approach. This increases the complexity because we want to have everything in one place.

We tried a solution by:

  1. Storing the code inside a ConfigMap of apiVersion v1
  2. Use a Job of apiVersion batch/v1 to (1) copy the code inside a zip file and (2) save the zip file into a storage bucket.
  3. The CloudFunctionFunction uses the zip file from the storage bucket

The Problem

The problem is the order of execution. Kustomize sometimes deployes the CloudFunctionsFunctions prior the Job zippnig and storing the code from the ConfigMap. The CloudFunctionsFunction will deploy "succesfully" but fail to run due the ZIP file been missing, then it doesn't try again to get it. This is a race condition issue.

Proposed Solutions

The following are some solutions to this issue:

  1. Following the example of systems such as "Terraform" and "Blueprints" introduce the argument dependsOn.
  2. Allow the use of annotations used for the purpose to indicate which components should run first. Example: job.yaml: Order: "1"

Definition of Done

Provide a mechanism that allows to indicate the order of deployment of all or certain components. Ensuring that some components will be deployed prior to other components.

acarlstein avatar Dec 04 '24 15:12 acarlstein

This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Dec 04 '24 15:12 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 04 '25 15:03 k8s-triage-robot

/remove-lifecycle stale

acarlstein avatar Mar 04 '25 19:03 acarlstein

Honestly, I don’t think this is doable with just Kustomize. It’s great for templating and patching, but managing deployment order isn’t really in its thing. If you really need to enforce order, you might look at using Helm—which has some hooks for ordering—or Argo CD, where you can set up sync waves to control what gets deployed when. A native dependsOn solution in Kustomize seems out of scope.

isarns avatar Mar 04 '25 20:03 isarns

^I generally agree with @isarns . It seems more like the job for an operator or scheduling plugin of some sort. Some slightly out-of-the-box suggestions that might suit use-cases similar to yours:

Job creates depending resource

  1. kustomize creates configMap with a CloudFunctionsFunction template
  2. kustomize creates Job specifying serviceAccountName and configMap
    1. Job uploads zip to GCS
    1. Job creates CloudFunctionsFunction with appropriate URL
  3. CloudFunctionsFunction successfully created referencing zip from previous step

Job creates required resource

this is unlikely to work for CloudFunctionsFunction because the zip ref is immutable, but it may work for similar uses

  1. kustomize creates Job specifying serviceAccountName
  2. kustomize creates <DependingResource> with a field like ...configMapRef set to name: archive-url
  3. <DependingResource> status/condition is updated to a pending state because configMap archive-url is not found
  4. Job uploads zip to GCS
  5. Job creates configMap archive-url containing the GCS URL
  6. <DependingResource> status/condition is updated by its controller and continues execution.

I've personally used a very similar pattern to gracefully handle sync delays for secrets from a shared vault in a home-grown controller. For Pod resources, envFrom.*.secretRef causes the pod to enter condition MountFailed until the secret exists. (this would also work with pods created by deployment or job)

Homegrown controller

If this is a common pattern for you or your org, you could write a custom controller that handles both the zip upload and the creation of the CloudFunctionsFunction in a standard way.

DanInProgress avatar Mar 05 '25 03:03 DanInProgress

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 03 '25 03:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 03 '25 04:07 k8s-triage-robot

Honestly, I don’t think this is doable with just Kustomize. It’s great for templating and patching, but managing deployment order isn’t really in its thing. If you really need to enforce order, you might look at using Helm—which has some hooks for ordering—or Argo CD, where you can set up sync waves to control what gets deployed when. A native dependsOn solution in Kustomize seems out of scope.

This is a cop out. If this solution is built into kubectl it should meet basic needs like ordering. Currently it does SOME reordering (namespaces seem to float to the top, for instance, of any render, which is sensible). That ordering seems currently limited to only 'vanilla kubernets object types'.

I think I could work with type based ordering. Currently I'm running into a problem where MirrordPolicy objects depend on MirrordProfile objects but get rendered in the opposite order, this causes the validating webhook of the mirrord controller to reject the Policies since the Profiles are not already present.

I suggest that you provide a way for us to set some priority ordering for non vanilla objects by type within the tiers you already seem to have set up. Falling back to whatever ordering you use currrently (it's not alphabetical I don't think .. and I don't have the motivation go spelunking at the moment) when not specified.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
  name: flux
  namespace: flux-system
objectRenderPriority:
- apiVerision: profiles.mirrord.metalbear.co/v1alpha
  kind: MirrordProfiles
- apiVerision: policies.mirrord.metalbear.co/v1alpha
  kind: MirrordPolicy
resources:
- foo/
- bar/
- baz/

SleepyBrett avatar Jul 15 '25 23:07 SleepyBrett

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Aug 14 '25 23:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Aug 14 '25 23:08 k8s-ci-robot