k8up icon indicating copy to clipboard operation
k8up copied to clipboard

NodeSelector and tolerations for Schedule

Open vring0 opened this issue 3 years ago • 1 comments
trafficstars

Summary

As "K8up user"
I want "configure NodeSelector and tolerations for Schedule."
So that "to run a backup on the selected nodes."

Context

By deploying via helm chart I can set NodeSelector and tolerations, but this only works for the operator Pod.

Name:         k8up-6cb4dd4fff-98zjr
Namespace:    k8up-operator
Priority:     0
Node:         gke-sandbox-backups-pool-978962aa-dsqt/10.17.0.32
Start Time:   Tue, 06 Sep 2022 15:44:09 +0500
Labels:       app.kubernetes.io/instance=k8up
              app.kubernetes.io/name=k8up
              pod-template-hash=6cb4dd4fff
Annotations:  cni.projectcalico.org/containerID: f17d5fcaeef8be03559863114eda666941035631281805a9454d342542df0d32
              cni.projectcalico.org/podIP: 10.36.1.9/32
              cni.projectcalico.org/podIPs: 10.36.1.9/32
Status:       Running
IP:           10.36.1.9
IPs:
  IP:           10.36.1.9
Controlled By:  ReplicaSet/k8up-6cb4dd4fff
Containers:
  k8up-operator:
    Container ID:  containerd://4eb07798a17d685a8558c52f49848a991a2a04b5579d168609c724540eaeee37
    Image:         ghcr.io/k8up-io/k8up:v2
    Image ID:      ghcr.io/k8up-io/k8up@sha256:59e02a83d4ab5b0f8d138eb4060dcf8238e3bc7612134c12dc31b4e8f382f75d
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      operator
    State:          Running
      Started:      Tue, 06 Sep 2022 15:44:13 +0500
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  256Mi
    Requests:
      cpu:     20m
      memory:  128Mi
    Liveness:  http-get http://:http/metrics delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      BACKUP_IMAGE:                   ghcr.io/k8up-io/k8up:v2
      BACKUP_ENABLE_LEADER_ELECTION:  true
      BACKUP_OPERATOR_NAMESPACE:      k8up-operator (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4xd9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-d4xd9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              cloud.google.com/gke-preemptible=true
Tolerations:                 backups:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Schedule Pod

Name:         backup-schedule-backup-k8up-operator-backup-knzcb-sd258
Namespace:    k8up-operator
Priority:     0
Node:         gke-seo-sandbox-default-pool-afdf077e-9bxb/10.17.0.30
Start Time:   Tue, 06 Sep 2022 16:01:31 +0500
Labels:       controller-uid=50d112ae-c202-4d8a-9ead-7589598bda47
              job-name=backup-schedule-backup-k8up-operator-backup-knzcb
              k8upjob=true
Annotations:  cni.projectcalico.org/containerID: 71b0f73b530c90ff8dc13d2e7c99e7362a946c617fbfe526c1675c7a0e5797d6
              cni.projectcalico.org/podIP: 10.36.0.130/32
              cni.projectcalico.org/podIPs: 10.36.0.130/32
Status:       Running
IP:           10.36.0.130
IPs:
  IP:           10.36.0.130
Controlled By:  Job/backup-schedule-backup-k8up-operator-backup-knzcb
Containers:
  backup:
    Container ID:  containerd://0b4dd5e5277035e254cf7b6dd6998f8ff171836d47e190be7303cc443a8e1ff8
    Image:         ghcr.io/k8up-io/k8up:v2
    Image ID:      ghcr.io/k8up-io/k8up@sha256:59e02a83d4ab5b0f8d138eb4060dcf8238e3bc7612134c12dc31b4e8f382f75d
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/k8up
      restic
    State:         
      Reason:      
    Last State:     
      Reason:       
      Exit Code:   
      Started:      
      Finished:     
    Ready:          
    Restart Count:  
    Environment:
      PROM_URL:                  http://127.0.0.1/
      BACKUPCOMMAND_ANNOTATION:  k8up.io/backupcommand
      FILEEXTENSION_ANNOTATION:  k8up.io/file-extension
      HOSTNAME:                  k8up-operator
      RESTIC_PASSWORD:           <set to the key 'K8UP_PASSWORD' in secret 'backup-repo'>            Optional: false
      AWS_ACCESS_KEY_ID:         <set to the key 'AWS_ACCESS_KEY_ID' in secret 'minio-credentials'>  Optional: false
      STATS_URL:                 
      AWS_SECRET_ACCESS_KEY:     <set to the key 'AWS_SECRET_ACCESS_KEY' in secret 'minio-credentials'>  Optional: false
      RESTIC_REPOSITORY:         s3:http://127.0.0.1:9000/k8up-operator
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wsjm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-8wsjm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Out of Scope

No response

Further links

https://k8up.io/k8up/2.3/references/api-reference.html#k8s-api-github-com-k8up-io-k8up-v2-api-v1-schedule

Acceptance Criteria

No response

Implementation Ideas

No response

vring0 avatar Sep 06 '22 11:09 vring0

Hi @vring0

I'd like to understand your request a bit better. What's the issue you'd like to solve by using node selectors and tolerations? Are your nodes starving for resources?

As it currently stands, we will re-write the scheduling of the backup jobs quite drastically to support the backup of RWO backups. The idea is to spawn a backup pod for each PVC in a given namespace. Of course the execution of those will be limited to a defined amount. In order for this to work, we'll have to have complete control over the scheduling of these pods. Because RWO PVCs can only be accessed from the same node. So user-definable tolerations and node selector would break this process.

Kidswiss avatar Sep 07 '22 08:09 Kidswiss

What about the check, prune and cleanup pods? Those don't need to run on the same node as the volume and end running on any node.

codestation avatar Mar 24 '24 03:03 codestation