cockroach-operator
cockroach-operator copied to clipboard
Version check job does not inherit tolerations and nodeSelector from CrdbCluster
I am running v2.4.0 of the operator installed like this:
- crds v2.4.0 without any modifications
-
operator v2.4.0 with the following modifications:
- added
linkerd.io/inject: enabled
to a namespace annotation - changed namespace: cockroach-operator-system -> cockroach-system
- added tolerations and nodeSelector to deployment
- added
My cluster is defined like this:
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: cockroachdb
namespace: cockroachdb
spec:
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "10Gi"
volumeMode: Filesystem
tlsEnabled: true
image:
name: cockroachdb/cockroach:v21.1.11
nodes: 3
additionalLabels:
crdb: is-cool
tolerations:
- key: tier
operator: Equal
value: platform
effect: NoSchedule
nodeSelector:
tier: platform
My kubernetes setup is a Kind cluster with the following configuration:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
podSubnet: 192.168.0.0/16
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
taints:
- key: tier
value: platform
effect: NoSchedule
kubeletExtraArgs:
node-labels: "tier=platform"
extraPortMappings:
# private ingress controller
- containerPort: 30080
hostPort: 7080
protocol: TCP
- containerPort: 30443
hostPort: 7443
protocol: TCP
# public ingress controller
- containerPort: 31080
hostPort: 8080
protocol: TCP
- containerPort: 31443
hostPort: 8443
protocol: TCP
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
taints:
- key: tier
value: application
effect: NoSchedule
kubeletExtraArgs:
node-labels: "tier=application"
This cluster runs tigera operator to install Calico CNI that supports network policies (though this information is irrelevant to the issue).
Problem
Default scheduler cannot find a node to schedule a pod for a version check job (cockroachdb-vcheck-...
) with the following error:
Warning FailedScheduling 15s (x2 over 93s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {tier: application}, that the pod didn't tolerate, 1 node(s) had taint {tier: platform}, that the pod didn't tolerate.
@kuznero try adding the feature flag as documented. You should be good to go then
- args:
- -zap-log-level
- info
- -feature-gates=TolerationRules=true,AffinityRules=true,TopologySpreadRules=true