cluster-operator
cluster-operator copied to clipboard
Problem with overriding statefulset readiness probe
Describe the bug
Overriding stateful set readiness probe from tcpSocket to exec keeps tcpSocket in its config.
To Reproduce
kubectl apply -f cluster-test.yml
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: rabbitmq-test
spec:
replicas: 5
image: 172.17.12.132:9110/rabbitmq/rabbitmq:3.13.4-management
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
persistence:
storageClassName: nfs-rabbitmq-test-storage
storage: "10Gi"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- servisi-0023
- servisi-0024
override:
statefulSet:
spec:
template:
spec:
containers:
- name: rabbitmq
livenessProbe:
exec:
command:
- rabbitmq-diagnostics
- status
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 15
readinessProbe:
exec:
command:
- rabbitmq-diagnostics
- ping
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- CHOWN
privileged: false
procMount: Default
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 999
runAsGroup: 100
volumeMounts:
- name: definitions-json
mountPath: /etc/rabbitmq/definitions.json
subPath: definitions.json
- name: rabbitmq-conf
mountPath: /etc/rabbitmq/rabbitmq.conf
subPath: rabbitmq.conf
#- name: rabbitmq-data
#mountPath: /var/lib/rabbitmq
securityContext:
fsGroup: 100
runAsNonRoot: true
runAsUser: 999
runAsGroup: 100
volumes:
- name: definitions-json
configMap:
name: rabbitmq-configmap
items:
- key: definitions.json
path: definitions.json
- name: rabbitmq-conf
configMap:
name: rabbitmq-configmap
items:
- key: rabbitmq.conf
path: rabbitmq.conf
#volumeClaimTemplates:
#- metadata:
#name: rabbitmq-data
# annotations:
# volume.alpha.kubernetes.io/storage-class: nfs-rabbitmq-test-storage
# spec:
# accessModes:
# - ReadWriteOnce
# storageClassName: nfs-rabbitmq-test-storage
# resources:
# requests:
# storage: 10Gi
kubectl get statefulset rabbitmq-test-server -o yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
rabbitmq.com/createdAt: "2024-08-13T09:42:31Z"
creationTimestamp: "2024-08-13T09:42:31Z"
generation: 1
labels:
app.kubernetes.io/component: rabbitmq
app.kubernetes.io/name: rabbitmq-test
app.kubernetes.io/part-of: rabbitmq
name: rabbitmq-test-server
namespace: rabbitmq-test
ownerReferences:
- apiVersion: rabbitmq.com/v1beta1
blockOwnerDeletion: true
controller: true
kind: RabbitmqCluster
name: rabbitmq-test
uid: 073ca32b-3fb0-4c92-a0b5-b840c679e36a
resourceVersion: "23728935"
uid: 704acd08-39cd-4507-b731-9d4f66c1813c
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
podManagementPolicy: Parallel
replicas: 5
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/name: rabbitmq-test
serviceName: rabbitmq-test-nodes
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: rabbitmq
app.kubernetes.io/name: rabbitmq-test
app.kubernetes.io/part-of: rabbitmq
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- servisi-0023
- servisi-0024
automountServiceAccountToken: true
containers:
- env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: K8S_SERVICE_NAME
value: rabbitmq-test-nodes
- name: RABBITMQ_ENABLED_PLUGINS_FILE
value: /operator/enabled_plugins
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
- name: K8S_HOSTNAME_SUFFIX
value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE)
image: 172.17.12.132:9110/rabbitmq/rabbitmq:3.13.4-management
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- if [ ! -z "$(cat /etc/pod-info/skipPreStopChecks)" ]; then exit 0;
fi; rabbitmq-upgrade await_online_quorum_plus_one -t 604800 && rabbitmq-upgrade
await_online_synchronized_mirror -t 604800 && rabbitmq-upgrade drain
-t 604800
livenessProbe:
exec:
command:
- rabbitmq-diagnostics
- status
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 60
successThreshold: 1
timeoutSeconds: 15
name: rabbitmq
ports:
- containerPort: 4369
name: epmd
protocol: TCP
- containerPort: 5672
name: amqp
protocol: TCP
- containerPort: 15672
name: management
protocol: TCP
- containerPort: 15692
name: prometheus
protocol: TCP
readinessProbe:
exec:
command:
- rabbitmq-diagnostics
- ping
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 60
successThreshold: 1
tcpSocket:
port: amqp
timeoutSeconds: 10
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- CHOWN
privileged: false
procMount: Default
readOnlyRootFilesystem: false
runAsGroup: 100
runAsNonRoot: false
runAsUser: 999
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/rabbitmq/
name: rabbitmq-erlang-cookie
- mountPath: /var/lib/rabbitmq/mnesia/
name: persistence
- mountPath: /etc/rabbitmq/definitions.json
name: definitions-json
subPath: definitions.json
- mountPath: /etc/rabbitmq/rabbitmq.conf
name: rabbitmq-conf
subPath: rabbitmq.conf
- mountPath: /operator
name: rabbitmq-plugins
- mountPath: /etc/rabbitmq/conf.d/10-operatorDefaults.conf
name: rabbitmq-confd
subPath: operatorDefaults.conf
- mountPath: /etc/rabbitmq/conf.d/90-userDefinedConfiguration.conf
name: rabbitmq-confd
subPath: userDefinedConfiguration.conf
- mountPath: /etc/pod-info/
name: pod-info
- mountPath: /etc/rabbitmq/conf.d/11-default_user.conf
name: rabbitmq-confd
subPath: default_user.conf
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- cp /tmp/erlang-cookie-secret/.erlang.cookie /var/lib/rabbitmq/.erlang.cookie
&& chmod 600 /var/lib/rabbitmq/.erlang.cookie ; cp /tmp/rabbitmq-plugins/enabled_plugins
/operator/enabled_plugins ; echo '[default]' > /var/lib/rabbitmq/.rabbitmqadmin.conf
&& sed -e 's/default_user/username/' -e 's/default_pass/password/' /tmp/default_user.conf
>> /var/lib/rabbitmq/.rabbitmqadmin.conf && chmod 600 /var/lib/rabbitmq/.rabbitmqadmin.conf
; sleep 30
image: 172.17.12.132:9110/rabbitmq/rabbitmq:3.13.4-management
imagePullPolicy: IfNotPresent
name: setup-container
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp/rabbitmq-plugins/
name: plugins-conf
- mountPath: /var/lib/rabbitmq/
name: rabbitmq-erlang-cookie
- mountPath: /tmp/erlang-cookie-secret/
name: erlang-cookie-secret
- mountPath: /operator
name: rabbitmq-plugins
- mountPath: /var/lib/rabbitmq/mnesia/
name: persistence
- mountPath: /tmp/default_user.conf
name: rabbitmq-confd
subPath: default_user.conf
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 100
runAsGroup: 100
runAsNonRoot: true
runAsUser: 999
serviceAccount: rabbitmq-test-server
serviceAccountName: rabbitmq-test-server
terminationGracePeriodSeconds: 604800
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/name: rabbitmq-test
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
volumes:
- configMap:
defaultMode: 420
items:
- key: definitions.json
path: definitions.json
name: rabbitmq-configmap
name: definitions-json
- configMap:
defaultMode: 420
items:
- key: rabbitmq.conf
path: rabbitmq.conf
name: rabbitmq-configmap
name: rabbitmq-conf
- configMap:
defaultMode: 420
name: rabbitmq-test-plugins-conf
name: plugins-conf
- name: rabbitmq-confd
projected:
defaultMode: 420
sources:
- configMap:
items:
- key: operatorDefaults.conf
path: operatorDefaults.conf
- key: userDefinedConfiguration.conf
path: userDefinedConfiguration.conf
name: rabbitmq-test-server-conf
- secret:
items:
- key: default_user.conf
path: default_user.conf
name: rabbitmq-test-default-user
- emptyDir: {}
name: rabbitmq-erlang-cookie
- name: erlang-cookie-secret
secret:
defaultMode: 420
secretName: rabbitmq-test-erlang-cookie
- emptyDir: {}
name: rabbitmq-plugins
- downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.labels['skipPreStopChecks']
path: skipPreStopChecks
name: pod-info
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: rabbitmq
app.kubernetes.io/name: rabbitmq-test
app.kubernetes.io/part-of: rabbitmq
name: persistence
namespace: rabbitmq-test
ownerReferences:
- apiVersion: rabbitmq.com/v1beta1
blockOwnerDeletion: false
controller: true
kind: RabbitmqCluster
name: rabbitmq-test
uid: 073ca32b-3fb0-4c92-a0b5-b840c679e36a
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs-rabbitmq-test-storage
volumeMode: Filesystem
status:
phase: Pending
status:
availableReplicas: 0
collisionCount: 0
currentRevision: rabbitmq-test-server-5b4fd5484d
observedGeneration: 1
replicas: 0
updateRevision: rabbitmq-test-server-5b4fd5484d
statefulset did not override readiness probe but keeps both exec and tcpSocket configs as follows:
readinessProbe:
exec:
command:
- rabbitmq-diagnostics
- ping
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 60
successThreshold: 1
tcpSocket:
port: amqp
timeoutSeconds: 10
which results in error
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 9m31s statefulset-controller create Claim persistence-rabbitmq-test-server-0 Pod rabbitmq-test-server-0 in StatefulSet rabbitmq-test-server success
Warning FailedCreate 4m4s (x17 over 9m31s) statefulset-controller create Pod rabbitmq-test-server-0 in StatefulSet rabbitmq-test-server failed error: Pod "rabbitmq-test-server-0" is invalid: spec.containers[0].readinessProbe.tcpSocket: Forbidden: may not specify more than 1 handler type
patching stateful set is an option to fix but it is not ideal!, please help.
While allowing the probe to be overriden is something we can consider, can you explain what you are trying to accomplish here? Why do you expect rabbitmq-diagnostics ping to be a better readiness probe? What are the situations where it would be better?
@mkuratczyk, we are facing the same issue with overriding readinessProbe.initialDelaySeconds. We are deploying rabbitmq on EKS + Fargate cluster and the intrinsic scheduling takes about 100 seconds. With the default for readinessProbe.initialDelaySeconds as 10s, we face the error everytime the rabbitmq pod is scheduled:
Readiness probe failed: dial tcp 10.35.177.155:5672: connect: connection refused
@sudhirjena
I've temporary fixed error by commenting readinessProbe as follows:
...
override:
statefulSet:
spec:
template:
spec:
containers:
- name: rabbitmq
livenessProbe:
exec:
command:
- rabbitmq-diagnostics
- status
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 15
# readinessProbe:
# tcpSocket:
# port: 22
# # exec:
# # command:
# # - rabbitmq-diagnostics
# # - ping
# initialDelaySeconds: 20
# periodSeconds: 60
# timeoutSeconds: 10
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- CHOWN
privileged: false
procMount: Default
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 999
runAsGroup: 100
...
Cluster has started without errors
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/rabbitmq-test-server-0 1/1 Running 0 7d13h 10.33.128.2 servisi-0023 <none> <none>
pod/rabbitmq-test-server-1 1/1 Running 0 7d13h 10.33.128.3 servisi-0023 <none> <none>
pod/rabbitmq-test-server-2 1/1 Running 0 7d13h 10.35.128.3 servisi-0024 <none> <none>
pod/rabbitmq-test-server-3 1/1 Running 0 7d13h 10.33.128.4 servisi-0023 <none> <none>
pod/rabbitmq-test-server-4 1/1 Running 0 7d13h 10.35.128.2 servisi-0024 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/rabbitmq-test ClusterIP 10.245.98.250 <none> 5672/TCP,15672/TCP,15692/TCP 27d app.kubernetes.io/name=rabbitmq-test
service/rabbitmq-test-nodes ClusterIP None <none> 4369/TCP,25672/TCP 27d app.kubernetes.io/name=rabbitmq-test
But as always, temporary solution might be a permanent one 🥇
We are not against the idea, so PRs welcome. This is an open source project, you don't have to wait for us to get around to implementing this.
This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring.
/assign