k8up
k8up copied to clipboard
NodeSelector and tolerations for Schedule
Summary
As "K8up user"
I want "configure NodeSelector and tolerations for Schedule."
So that "to run a backup on the selected nodes."
Context
By deploying via helm chart I can set NodeSelector and tolerations, but this only works for the operator Pod.
Name: k8up-6cb4dd4fff-98zjr
Namespace: k8up-operator
Priority: 0
Node: gke-sandbox-backups-pool-978962aa-dsqt/10.17.0.32
Start Time: Tue, 06 Sep 2022 15:44:09 +0500
Labels: app.kubernetes.io/instance=k8up
app.kubernetes.io/name=k8up
pod-template-hash=6cb4dd4fff
Annotations: cni.projectcalico.org/containerID: f17d5fcaeef8be03559863114eda666941035631281805a9454d342542df0d32
cni.projectcalico.org/podIP: 10.36.1.9/32
cni.projectcalico.org/podIPs: 10.36.1.9/32
Status: Running
IP: 10.36.1.9
IPs:
IP: 10.36.1.9
Controlled By: ReplicaSet/k8up-6cb4dd4fff
Containers:
k8up-operator:
Container ID: containerd://4eb07798a17d685a8558c52f49848a991a2a04b5579d168609c724540eaeee37
Image: ghcr.io/k8up-io/k8up:v2
Image ID: ghcr.io/k8up-io/k8up@sha256:59e02a83d4ab5b0f8d138eb4060dcf8238e3bc7612134c12dc31b4e8f382f75d
Port: 8080/TCP
Host Port: 0/TCP
Args:
operator
State: Running
Started: Tue, 06 Sep 2022 15:44:13 +0500
Ready: True
Restart Count: 0
Limits:
memory: 256Mi
Requests:
cpu: 20m
memory: 128Mi
Liveness: http-get http://:http/metrics delay=30s timeout=1s period=10s #success=1 #failure=3
Environment:
BACKUP_IMAGE: ghcr.io/k8up-io/k8up:v2
BACKUP_ENABLE_LEADER_ELECTION: true
BACKUP_OPERATOR_NAMESPACE: k8up-operator (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4xd9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-d4xd9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: cloud.google.com/gke-preemptible=true
Tolerations: backups:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Schedule Pod
Name: backup-schedule-backup-k8up-operator-backup-knzcb-sd258
Namespace: k8up-operator
Priority: 0
Node: gke-seo-sandbox-default-pool-afdf077e-9bxb/10.17.0.30
Start Time: Tue, 06 Sep 2022 16:01:31 +0500
Labels: controller-uid=50d112ae-c202-4d8a-9ead-7589598bda47
job-name=backup-schedule-backup-k8up-operator-backup-knzcb
k8upjob=true
Annotations: cni.projectcalico.org/containerID: 71b0f73b530c90ff8dc13d2e7c99e7362a946c617fbfe526c1675c7a0e5797d6
cni.projectcalico.org/podIP: 10.36.0.130/32
cni.projectcalico.org/podIPs: 10.36.0.130/32
Status: Running
IP: 10.36.0.130
IPs:
IP: 10.36.0.130
Controlled By: Job/backup-schedule-backup-k8up-operator-backup-knzcb
Containers:
backup:
Container ID: containerd://0b4dd5e5277035e254cf7b6dd6998f8ff171836d47e190be7303cc443a8e1ff8
Image: ghcr.io/k8up-io/k8up:v2
Image ID: ghcr.io/k8up-io/k8up@sha256:59e02a83d4ab5b0f8d138eb4060dcf8238e3bc7612134c12dc31b4e8f382f75d
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/k8up
restic
State:
Reason:
Last State:
Reason:
Exit Code:
Started:
Finished:
Ready:
Restart Count:
Environment:
PROM_URL: http://127.0.0.1/
BACKUPCOMMAND_ANNOTATION: k8up.io/backupcommand
FILEEXTENSION_ANNOTATION: k8up.io/file-extension
HOSTNAME: k8up-operator
RESTIC_PASSWORD: <set to the key 'K8UP_PASSWORD' in secret 'backup-repo'> Optional: false
AWS_ACCESS_KEY_ID: <set to the key 'AWS_ACCESS_KEY_ID' in secret 'minio-credentials'> Optional: false
STATS_URL:
AWS_SECRET_ACCESS_KEY: <set to the key 'AWS_SECRET_ACCESS_KEY' in secret 'minio-credentials'> Optional: false
RESTIC_REPOSITORY: s3:http://127.0.0.1:9000/k8up-operator
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8wsjm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-8wsjm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Out of Scope
No response
Further links
https://k8up.io/k8up/2.3/references/api-reference.html#k8s-api-github-com-k8up-io-k8up-v2-api-v1-schedule
Acceptance Criteria
No response
Implementation Ideas
No response
Hi @vring0
I'd like to understand your request a bit better. What's the issue you'd like to solve by using node selectors and tolerations? Are your nodes starving for resources?
As it currently stands, we will re-write the scheduling of the backup jobs quite drastically to support the backup of RWO backups. The idea is to spawn a backup pod for each PVC in a given namespace. Of course the execution of those will be limited to a defined amount. In order for this to work, we'll have to have complete control over the scheduling of these pods. Because RWO PVCs can only be accessed from the same node. So user-definable tolerations and node selector would break this process.
What about the check, prune and cleanup pods? Those don't need to run on the same node as the volume and end running on any node.