postgres-operator
postgres-operator copied to clipboard
Affinity for Backups Jobs
Postgres Operator: 5.0.3
Title
Backup Jobs do not respect affinity settings in the repoHost section (should they do?)
Description
In my config I have two nodes labeled with ckrole=db
.
Master and replica pods are running on a db
nodes, each on a different node (thanks to pod anti affinity)
SS repo-host also has scheduled its pod to the db
node, but it's job doesn't
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/postgres-cluster-lh-vl-backup-qgkj-fthws 0/1 ContainerCreating 0 62m <none> win-worker-01 <none> <none>
(we have one windows worker in the cluster)
Deployment
postgres/postgres.yaml file from the postgres-operator-examples
repo
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: postgres-cluster-lh-vl
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.4-1
postgresVersion: 13
users:
- name: postgres
- name: testuser-cl1
databases:
- testuser_cl1_db
- name: testuser-cl2
databases:
- testuser_cl2_db
instances:
- name: instance1
replicas: 2
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: ckrole
operator: In
values:
- db
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/cluster: postgres-cluster-lh-vl
postgres-operator.crunchydata.com/instance-set: instance1
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
volumeMode: Filesystem
storageClassName: longhorn-cluster
resources:
requests:
storage: 1Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repoHost:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: ckrole
operator: In
values:
- db
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
volumeMode: Filesystem
storageClassName: longhorn-cluster
resources:
requests:
storage: 1Gi
I didn't find affinity settings for the backup jobs, so I think they should respect repoHost settings. Should they?
Backups are taken from the primary, so it is generally better for the backup Job to be a bit closer to the primary host.
The repoHost affinity rules are specifically for the repository, not any of the Jobs.
That all said, we do have it in our roadmap to add affinity rules for Jobs around the backup system.
That all said, we do have it in our roadmap to add affinity rules for Jobs around the backup system.
Thanks for the response. Do you have any estimations for this enhancement?
Seems that for now the only way out is to taint windows nodes...
restore
already supports affinity
and tolerations
, would be good if scheduled and manual backup jobs
would support them too so we can schedule them on infra nodes, while the db resides on worker nodes which are more tightly calculated on resources.
restore
already supportsaffinity
andtolerations
, would be good if scheduled and manual backupjobs
would support them too so we can schedule them on infra nodes, while the db resides on worker nodes which are more tightly calculated on resources.
especially if the volume holding the data for the database is only available on one node (where the database is actually running). So the backup job MUST run on the same node as the db, as otherwise the backup will fail!
When will the backup/affinity setting will be available?
Any indication when the 5.2.0 version of the operator will be pushed to operatorhub.
https://operatorhub.io/operator/postgresql
At the moment my backups get scheduled to arm64 nodes (hybrid cluster arm64 + amd64 nodes) and fail so this feature would solve my problems and be appreciated.
@darktempla In case you missed it, Crunchy PGO 5.2.0 is now available on OperatorHub.
@darktempla In case you missed it, Crunchy PGO 5.2.0 is now available on OperatorHub.
@tjmoore4 - Literally must have dropped just after I checked before commenting on this issue ;) otherwise a webpage issue got the better of me. Happy chappy thanks for letting me know I will take it for a spin.
@maxsivkov Affinity for backup Jobs has been added with #3260, so closing this issue.