nebula-operator
nebula-operator copied to clipboard
[web hook] when storage scale out and pending, because of resource not enough, then can not execute scale in ,it seams stuck
when enable web hook. scale out storage but failed, because cpu not enough
$ kubectl -n nebula describe pod nebulazone-storaged-9
Name: nebulazone-storaged-9
Namespace: nebula
Priority: 0
Service Account: nebula-sa
Node: <none>
Labels: app.kubernetes.io/cluster=nebulazone
app.kubernetes.io/component=storaged
app.kubernetes.io/managed-by=nebula-operator
app.kubernetes.io/name=nebula-graph
controller-revision-hash=nebulazone-storaged-5b568d554c
statefulset.kubernetes.io/pod-name=nebulazone-storaged-9
Annotations: cloud.google.com/cluster_autoscaler_unhelpable_since: 2023-10-09T09:58:34+0000
cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
nebula-graph.io/cm-hash: 760645648930d20e
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/nebulazone-storaged
Containers:
storaged:
Image: asia-east2-docker.pkg.dev/nebula-cloud-test/poc/rc/nebula-storaged-ent:v3.5.0-sc
Ports: 9779/TCP, 19789/TCP, 9778/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/bin/sh
-ecx
exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebulazone-metad-0.nebulazone-metad-headless.nebula.svc.cluster.local:9559,nebulazone-metad-1.nebulazone-metad-headless.nebula.svc.cluster.local:9559,nebulazone-metad-2.nebulazone-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebulazone-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebulazone-storaged-headless.nebula.svc.cluster.local --daemonize=false --ws_http_port=19789
Limits:
cpu: 3
memory: 16Gi
Requests:
cpu: 2
memory: 8Gi
Readiness: http-get http://:19789/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/usr/local/nebula/data from storaged-data (rw,path="data")
/usr/local/nebula/etc/nebula-storaged.conf from nebulazone-storaged (rw,path="nebula-storaged.conf")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j86r9 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
storaged-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: storaged-data-nebulazone-storaged-9
ReadOnly: false
nebulazone-storaged:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nebulazone-storaged
Optional: false
kube-api-access-j86r9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/cluster=nebulazone,app.kubernetes.io/component=storaged,app.kubernetes.io/managed-by=nebula-operator,app.kubernetes.io/name=nebula-graph
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 48s nebula-scheduler 0/3 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Warning FailedScheduling 45s nebula-scheduler 0/3 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Normal NotTriggerScaleUp 46s cluster-autoscaler pod didn't trigger scale-up:
Your Environments (required)
nebula-operator: snap1.19
Expected behavior
when pending cause by resource , stop the scale out ,return to last status .
i resolve the problem by edit nebula-operator deployment set --enable-admission-webhook=false
, to let webhook stop
I think insufficient resource problem is not a bug, admission webhook is used for preventing operations in intermediate state.