kubegres icon indicating copy to clipboard operation
kubegres copied to clipboard

imagePullSecrets not used in backup cronjob templates

Open jmbarbier opened this issue 3 years ago • 4 comments

If imagePullSecrets are used to pull main database images, they are not used for backup templates. So if the backup job is scheduled on a pod without database image, it fails pulling the image.

Manually adding imagePullSecrets to CronJob definition works but this is a dirty workaround :)

jmbarbier avatar Dec 26 '21 16:12 jmbarbier

Thank you for reporting this issue when imagePullSecrets are used to pull main database images.

Could you please share a YAML example where your configuration does not work?

alex-arica avatar Dec 27 '21 12:12 alex-arica

thank you for your quick reply.. here is some more info :

Launching this yaml file on a cluster (at scaleway.com => scw-xxx) with 1 node is fine

apiVersion: v1
data:
  .dockerconfigjson: REDACTED
kind: Secret
metadata:
  name: registry-secret
  namespace: kubegres-sandbox
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kubegres-backup-issue-78-pvc
  namespace: kubegres-sandbox
spec:
  storageClassName: scw-bssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: kubegres-issue-78
  namespace: kubegres-sandbox
spec:
  replicas: 1
  image: my-private-registry/image:tag
  imagePullSecrets:
    - name: registry-secret
  database:
    size: 1Gi
    storageClassName: scw-bssd
  failover:
    isDisabled: true
  backup:
    schedule: "*/3 * * * *"
    pvcName: kubegres-backup-issue-78-pvc
    volumeMount: /var/lib/backup
  env:
    - name: POSTGRES_PASSWORD
      value: supassword
    - name: POSTGRES_REPLICATION_PASSWORD
      value: reppassword

Backup CronJob is created :

kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: backup-kubegres-issue-78
  namespace: kubegres-sandbox
  uid: 499fa9f5-5a59-48d9-8fa2-7b08c74ca34b
  resourceVersion: '1469331142'
  generation: 1
  creationTimestamp: '2021-12-27T14:02:36Z'
  ownerReferences:
    - apiVersion: kubegres.reactive-tech.io/v1
      kind: Kubegres
      name: kubegres-issue-78
      uid: db3cb326-bc37-434e-b3ba-44175b083eb6
      controller: true
      blockOwnerDeletion: true
spec:
  schedule: '*/3 * * * *'
  concurrencyPolicy: Forbid
  suspend: false
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          volumes:
            - name: backup-volume
              persistentVolumeClaim:
                claimName: kubegres-backup-issue-78-pvc
            - name: postgres-config
              configMap:
                name: base-kubegres-config
                defaultMode: 511
          containers:
            - name: backup-postgres
              image: my-private-registry/image:tag
              args:
                - sh
                - '-c'
                - /tmp/backup_database.sh
              env:
                - name: PGPASSWORD
                - name: KUBEGRES_RESOURCE_NAME
                  value: kubegres-issue-78
                - name: BACKUP_DESTINATION_FOLDER
                  value: /var/lib/backup
                - name: BACKUP_SOURCE_DB_HOST_NAME
                  value: kubegres-issue-78
                - name: POSTGRES_PASSWORD
                  value: supassword
                - name: POSTGRES_REPLICATION_PASSWORD
                  value: reppassword
              resources: {}
              volumeMounts:
                - name: backup-volume
                  mountPath: /var/lib/backup
                - name: postgres-config
                  mountPath: /tmp/backup_database.sh
                  subPath: backup_database.sh
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              imagePullPolicy: IfNotPresent
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 30
          dnsPolicy: ClusterFirst
          securityContext: {}
          schedulerName: default-scheduler
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
status:
  lastScheduleTime: '2021-12-27T14:06:00Z'
  lastSuccessfulTime: '2021-12-27T14:06:05Z'

the imagePullSecrets is missing there.. on my single node cluster, jobs are OK, because my private image is already present (imagePullPolicy: IfNotPresent)

backup-kubegres-issue-78-27343563   1/1           17s        7m46s
backup-kubegres-issue-78-27343566   1/1           5s         4m46s

But if i add some nodes to the cluster

NAME                                             STATUS   ROLES    AGE     VERSION
scw-k8s-solidev-default-34b29ff7154b4452a4ace2   Ready    <none>   20d     v1.23.0
scw-k8s-solidev-default-817558573a0244e09202dc   Ready    <none>   6m49s   v1.23.0
scw-k8s-solidev-default-ee42e6727e114831aeaac8   Ready    <none>   6m24s   v1.23.0

the backup job fails depending of which node it is scheduled :

✦ ➜ kubectl get jobs
NAME                                COMPLETIONS   DURATION   AGE
backup-kubegres-issue-78-27343563   1/1           17s        9m57s
backup-kubegres-issue-78-27343566   1/1           5s         6m57s
backup-kubegres-issue-78-27343569   0/1           3m57s      3m57s
✦ ➜ kubectl describe jobs/backup-kubegres-issue-78-27343569
Name:             backup-kubegres-issue-78-27343569
Namespace:        kubegres-sandbox
(...)
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  5m33s  job-controller  Created pod: backup-kubegres-issue-78-27343569-hb2d6
✦ ➜ kubectl describe pods/backup-kubegres-issue-78-27343569-hb2d6
Name:         backup-kubegres-issue-78-27343569-hb2d6
Namespace:    kubegres-sandbox
Priority:     0
Node:         scw-k8s-solidev-default-817558573a0244e09202dc/10.197.230.31
Start Time:   Mon, 27 Dec 2021 15:09:00 +0100
(...)
Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               6m12s                 default-scheduler        Successfully assigned kubegres-sandbox/backup-kubegres-issue-78-27343569-hb2d6 to scw-k8s-solidev-default-817558573a0244e09202dc
  Normal   SuccessfulAttachVolume  6m11s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-cef4f654-d552-4a9d-b7c5-1accd1757530"
  Normal   Pulling                 4m46s (x4 over 6m8s)  kubelet                  Pulling image "my-private-registry/image:tag"
  Warning  Failed                  4m46s (x4 over 6m8s)  kubelet                  Failed to pull image "my-private-registry/image:tag": rpc error: code = Unknown desc = failed to pull and unpack image "my-private-registry/image:tag": failed to resolve reference "my-private-registry/image:tag": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
  Warning  Failed                  4m46s (x4 over 6m8s)  kubelet                  Error: ErrImagePull
  Warning  Failed                  4m18s (x6 over 6m7s)  kubelet                  Error: ImagePullBackOff
  Normal   BackOff                 58s (x20 over 6m7s)   kubelet                  Back-off pulling image "my-private-registry/image:tag"

the private image have not been pulled on new nodes, so when the backup is scheduled on one of these new nodes, as the imagePullSecrets field is missing in CronJob spec, the container cannot be pulled.

jmbarbier avatar Dec 27 '21 14:12 jmbarbier

Thank you for those details which will help me with the investigation.

alex-arica avatar Dec 27 '21 16:12 alex-arica

Hi @alex-arica sent a PR #103 to fix this issue.

urbany avatar Mar 21 '22 15:03 urbany