helm icon indicating copy to clipboard operation
helm copied to clipboard

Once one pre-install,pre-update hook passes Helm do not wait for the other hooks statuses

Open nofearOnline opened this issue 2 years ago • 6 comments

Output of helm version: version.BuildInfo{Version:"v3.8.1", GitCommit:"5cb9af4b1b271d11d7a97a71df3ac337dd94ad37", GitTreeState:"clean", GoVersion:"go1.17.5"}

Output of kubectl version: Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): EKS

I configured the following objects with hooks(see below). one secret creation and one job creation, Helm creates the secret and the job, then it creates the new release without waiting for the job to succeed. In case of job failure, the release still gets deployed!

Someone already notice this in the past and a PR passed(raised here: https://github.com/helm/helm/issues/6874 and solved here: https://github.com/helm/helm/pull/6907) but as I see it, it did not get resolved. Maybe I misunderstood something.

apiVersion: v1
kind: Secret
metadata:
  namespace: {{ .Values.environment }}
  name: pre-job-secret
  annotations:
    "helm.sh/hook": "pre-install,pre-upgrade"
    "helm.sh/hook-weight": "0"
data:
  POSTGRES_ADDR: {{ .Values.secret.postgresAddress | b64enc }}
  POSTGRES_DB: {{ .Values.secret.postgresDB | b64enc }}
  POSTGRES_PASSWORD: {{ .Values.secret.postgresPassword | b64enc }}
  POSTGRES_USER: {{ .Values.secret.postgresUser | b64enc }}

---

apiVersion: batch/v1
kind: Job
metadata:
  namespace: {{ .Values.environment }}
  name: {{ .Chart.Name }}-db-migration-job
  labels: 
    {{- include "backend.labels" . | nindent 4 }}
spec:
  backoffLimit: 2
  completions: 1
  parallelism: 1
  template:
    metadata:
      namespace: {{ .Values.environment }}
      labels: 
        role: hook
      annotations:
        # This is what defines this resource as a hook. Without this line, the
        # job is considered part of the release.
        "helm.sh/hook": pre-install,pre-upgrade
        "helm.sh/hook-weight": "1"
        "helm.sh/hook-delete-policy": before-hook-creation
        
    spec:
      restartPolicy: "Never"
      containers:
        - name: {{ .Chart.Name }}-container
          {{- with .Values.migrateJob }}
          image: {{ .image.repository }}:{{ .image.tag | default $.Chart.Version }}
          imagePullPolicy: {{ .image.pullPolicy }}
          {{- if eq $.Values.debug true }}
          command: ["sleep"]
          args: ["10"]
          {{- else }}
          command: 
{{ toYaml .command | indent 12 }}
          {{- end }}
          env:
          - name: PGSSLMODE
            value: disable
          - name: POSTGRES_ADDR
            valueFrom:
              secretKeyRef:
                name: pre-job-secret
                key: POSTGRES_ADDR
          - name: POSTGRES_DB
            valueFrom:
              secretKeyRef:
                name: pre-job-secret
                key: POSTGRES_DB
          - name: POSTGRES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: pre-job-secret
                key: POSTGRES_PASSWORD
          - name: POSTGRES_USER
            valueFrom:
              secretKeyRef:
                name: pre-job-secret
                key: POSTGRES_USER
          resources: {}
      {{- with .nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      
# end of with .Values.deployment
          {{- end }}

nofearOnline avatar Apr 03 '22 11:04 nofearOnline

https://helm.sh/docs/topics/charts_hooks/#hook-resources-are-not-managed-with-corresponding-releases

Hooks are not tracked or managed as part of the release. Once Helm verifies that the hook has reached its "ready" state, it will stop tracking the hook's state. That is to say, it does not care whether the job reaches the "success" or "failed" state; it only cares that the job was started.

bacongobbler avatar Apr 04 '22 17:04 bacongobbler

@bacongobbler two things:

  1. If you only run a job as a hook the release does wait for the job to succeed and if the job fails the release is canceled with: "Error: UPGRADE FAILED: pre-upgrade hooks failed: job failed: BackoffLimitExceeded"
  2. How do you define ready, a job has the states: Failed and Complete. The pods behind it have the states: Pending, Running, Succeeded, Failed, and Unknown. What do you count as "ready"?

nofearOnline avatar Apr 07 '22 10:04 nofearOnline

@bacongobbler any answers, please?

nofearOnline avatar Apr 28 '22 15:04 nofearOnline

I would suggest reading the code to determine the answers yourself... I did not write Helm's wait logic.

https://github.com/helm/helm/blob/main/pkg/kube/wait.go

bacongobbler avatar Apr 28 '22 16:04 bacongobbler

@bacongobbler two things:

  1. If you only run a job as a hook the release does wait for the job to succeed and if the job fails the release is canceled with: "Error: UPGRADE FAILED: pre-upgrade hooks failed: job failed: BackoffLimitExceeded"
  2. How do you define ready, a job has the states: Failed and Complete. The pods behind it have the states: Pending, Running, Succeeded, Failed, and Unknown. What do you count as "ready"?

I have not been able to make it blocking even with a Job, even if the job fails the main pod/deployment goes through. How did you make the job blocking as a pre-upgrade hook?

geshan avatar May 11 '22 21:05 geshan

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

github-actions[bot] avatar Aug 10 '22 00:08 github-actions[bot]