Race Condition Issue: Assign Order of Execution to Certain Components.
Summary
There are occations where the order of deployment of components (of kind such as Job, CloudFunctionsFunction, etc.) matters. Find a way to indicate in the kustomize.yaml file in which order certain components should be deployed.
Description
Lets assume you're trying to deploy a Cloud Function using the CRD CloudFunctionsFunction of apiVersion cloudfunctions.cnrm.cloud.google.com/v1beta1.
This component requires that either:
- You provide an url to a repository where the code reside,
- or you to provide an url of a storage bucket where the code resides inside a ZIP file.
Regredably, the repository where the code resides isn't accessable by CloudFunctionsFunctions; therefore, you can only follow the "Zip file" approach. This increases the complexity because we want to have everything in one place.
We tried a solution by:
- Storing the code inside a
ConfigMapof apiVersionv1 - Use a
Jobof apiVersionbatch/v1to (1) copy the code inside a zip file and (2) save the zip file into a storage bucket. - The
CloudFunctionFunctionuses the zip file from the storage bucket
The Problem
The problem is the order of execution. Kustomize sometimes deployes the CloudFunctionsFunctions prior the Job zippnig and storing the code from the ConfigMap. The CloudFunctionsFunction will deploy "succesfully" but fail to run due the ZIP file been missing, then it doesn't try again to get it. This is a race condition issue.
Proposed Solutions
The following are some solutions to this issue:
- Following the example of systems such as "Terraform" and "Blueprints" introduce the argument
dependsOn. - Allow the use of annotations used for the purpose to indicate which components should run first. Example:
job.yaml: Order: "1"
Definition of Done
Provide a mechanism that allows to indicate the order of deployment of all or certain components. Ensuring that some components will be deployed prior to other components.
This issue is currently awaiting triage.
SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Honestly, I don’t think this is doable with just Kustomize. It’s great for templating and patching, but managing deployment order isn’t really in its thing. If you really need to enforce order, you might look at using Helm—which has some hooks for ordering—or Argo CD, where you can set up sync waves to control what gets deployed when. A native dependsOn solution in Kustomize seems out of scope.
^I generally agree with @isarns . It seems more like the job for an operator or scheduling plugin of some sort. Some slightly out-of-the-box suggestions that might suit use-cases similar to yours:
Job creates depending resource
- kustomize creates configMap with a
CloudFunctionsFunctiontemplate - kustomize creates
JobspecifyingserviceAccountNameandconfigMap -
- Job uploads zip to GCS
-
- Job creates
CloudFunctionsFunctionwith appropriate URL
- Job creates
-
CloudFunctionsFunctionsuccessfully created referencing zip from previous step
Job creates required resource
this is unlikely to work for CloudFunctionsFunction because the zip ref is immutable, but it may work for similar uses
- kustomize creates
JobspecifyingserviceAccountName - kustomize creates
<DependingResource>with a field like...configMapRefset toname: archive-url -
<DependingResource>status/condition is updated to a pending state because configMaparchive-urlis not found - Job uploads zip to GCS
- Job creates configMap
archive-urlcontaining the GCS URL -
<DependingResource>status/condition is updated by its controller and continues execution.
I've personally used a very similar pattern to gracefully handle sync delays for secrets from a shared vault in a home-grown controller. For Pod resources, envFrom.*.secretRef causes the pod to enter condition MountFailed until the secret exists. (this would also work with pods created by deployment or job)
Homegrown controller
If this is a common pattern for you or your org, you could write a custom controller that handles both the zip upload and the creation of the CloudFunctionsFunction in a standard way.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Honestly, I don’t think this is doable with just Kustomize. It’s great for templating and patching, but managing deployment order isn’t really in its thing. If you really need to enforce order, you might look at using Helm—which has some hooks for ordering—or Argo CD, where you can set up sync waves to control what gets deployed when. A native dependsOn solution in Kustomize seems out of scope.
This is a cop out. If this solution is built into kubectl it should meet basic needs like ordering. Currently it does SOME reordering (namespaces seem to float to the top, for instance, of any render, which is sensible). That ordering seems currently limited to only 'vanilla kubernets object types'.
I think I could work with type based ordering. Currently I'm running into a problem where MirrordPolicy objects depend on MirrordProfile objects but get rendered in the opposite order, this causes the validating webhook of the mirrord controller to reject the Policies since the Profiles are not already present.
I suggest that you provide a way for us to set some priority ordering for non vanilla objects by type within the tiers you already seem to have set up. Falling back to whatever ordering you use currrently (it's not alphabetical I don't think .. and I don't have the motivation go spelunking at the moment) when not specified.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: flux
namespace: flux-system
objectRenderPriority:
- apiVerision: profiles.mirrord.metalbear.co/v1alpha
kind: MirrordProfiles
- apiVerision: policies.mirrord.metalbear.co/v1alpha
kind: MirrordPolicy
resources:
- foo/
- bar/
- baz/
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.