nebula-operator [web hook] when storage scale out and pending, because of resource not enough, then can not execute scale in ,it seams stuck

when enable web hook. scale out storage but failed, because cpu not enough

$ kubectl -n nebula describe pod nebulazone-storaged-9
Name:             nebulazone-storaged-9
Namespace:        nebula
Priority:         0
Service Account:  nebula-sa
Node:             <none>
Labels:           app.kubernetes.io/cluster=nebulazone
                  app.kubernetes.io/component=storaged
                  app.kubernetes.io/managed-by=nebula-operator
                  app.kubernetes.io/name=nebula-graph
                  controller-revision-hash=nebulazone-storaged-5b568d554c
                  statefulset.kubernetes.io/pod-name=nebulazone-storaged-9
Annotations:      cloud.google.com/cluster_autoscaler_unhelpable_since: 2023-10-09T09:58:34+0000
                  cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
                  nebula-graph.io/cm-hash: 760645648930d20e
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/nebulazone-storaged
Containers:
  storaged:
    Image:       asia-east2-docker.pkg.dev/nebula-cloud-test/poc/rc/nebula-storaged-ent:v3.5.0-sc
    Ports:       9779/TCP, 19789/TCP, 9778/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/sh
      -ecx
      exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebulazone-metad-0.nebulazone-metad-headless.nebula.svc.cluster.local:9559,nebulazone-metad-1.nebulazone-metad-headless.nebula.svc.cluster.local:9559,nebulazone-metad-2.nebulazone-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebulazone-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebulazone-storaged-headless.nebula.svc.cluster.local --daemonize=false --ws_http_port=19789
    Limits:
      cpu:     3
      memory:  16Gi
    Requests:
      cpu:        2
      memory:     8Gi
    Readiness:    http-get http://:19789/status delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /usr/local/nebula/data from storaged-data (rw,path="data")
      /usr/local/nebula/etc/nebula-storaged.conf from nebulazone-storaged (rw,path="nebula-storaged.conf")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j86r9 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  storaged-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storaged-data-nebulazone-storaged-9
    ReadOnly:   false
  nebulazone-storaged:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebulazone-storaged
    Optional:  false
  kube-api-access-j86r9:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/cluster=nebulazone,app.kubernetes.io/component=storaged,app.kubernetes.io/managed-by=nebula-operator,app.kubernetes.io/name=nebula-graph
Events:
  Type     Reason             Age   From                Message
  ----     ------             ----  ----                -------
  Warning  FailedScheduling   48s   nebula-scheduler    0/3 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
  Warning  FailedScheduling   45s   nebula-scheduler    0/3 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
  Normal   NotTriggerScaleUp  46s   cluster-autoscaler  pod didn't trigger scale-up:

Your Environments (required)

nebula-operator: snap1.19

Expected behavior

when pending cause by resource , stop the scale out ,return to last status .

Oct 09 '23 10:10 jinyingsunny

i resolve the problem by edit nebula-operator deployment set --enable-admission-webhook=false, to let webhook stop

Oct 09 '23 10:10 jinyingsunny

I think insufficient resource problem is not a bug, admission webhook is used for preventing operations in intermediate state.

Oct 10 '23 08:10 MegaByte875

nebula-operator nebula-operator copied to clipboard

[web hook] when storage scale out and pending, because of resource not enough, then can not execute scale in ,it seams stuck

nebula-operator
nebula-operator copied to clipboard