autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

fix(kwok): prevent quitting when scaling down node group

Open qianlei90 opened this issue 1 year ago • 14 comments

What type of PR is this?

/kind bug

What this PR does / why we need it:

When using the Kwok provider, CA quits when scaling down a node group because the Kwok provider cannot retrieve the node group name from a fake node. This PR primarily aims to fix this issue.

Additionally, I have fixed the target size when scaling up and down the node group.

Which issue(s) this PR fixes:

kwok-provider-config
apiVersion: v1
data:
  config: |-
    apiVersion: v1alpha1
    readNodesFrom: configmap
    nodegroups:
      fromNodeLabelKey: "kwok-nodegroup"
    nodes:
    configmap:
      name: kwok-provider-templates
    kwok:
      install: false
kind: ConfigMap
metadata:
  name: kwok-provider-config
  namespace: default
kwok-provider-templates
apiVersion: v1
data:
  templates: |-
    apiVersion: v1
    items:
    - apiVersion: v1
      kind: Node
      metadata:
        annotations:
          node.alpha.kubernetes.io/ttl: "0"
          kwok.x-k8s.io/node: fake
        labels:
          beta.kubernetes.io/arch: amd64
          beta.kubernetes.io/os: linux
          kubernetes.io/arch: amd64
          kubernetes.io/hostname: kwok-node-0
          kubernetes.io/os: linux
          kubernetes.io/role: agent
          node-role.kubernetes.io/agent: ""
          type: kwok
          kwok-nodegroup: cluster-autoscaler
        name: kwok-node-0
      spec: {}
      status:
        allocatable:
          cpu: 32
          memory: 256Gi
          pods: 110
        capacity:
          cpu: 32
          memory: 256Gi
          pods: 110
        nodeInfo:
          architecture: amd64
          bootID: ""
          containerRuntimeVersion: ""
          kernelVersion: ""
          kubeProxyVersion: fake
          kubeletVersion: fake
          machineID: ""
          operatingSystem: linux
          osImage: ""
          systemUUID: ""
        phase: Running
    kind: List
    metadata:
      resourceVersion: ""
kind: ConfigMap
metadata:
  name: kwok-provider-templates
  namespace: default
starting CA
POD_NAMESPACE=default KWOK_PROVIDER_MODE=local ./cluster-autoscaler-amd64
    --cloud-provider=kwok \
    --namespace=default \
    --kubeconfig=<kubeconfig> \
    --expander=random \
    --scale-down-enabled=true \
    --scale-down-utilization-threshold=0.5 \
    --scale-down-gpu-utilization-threshold=0.5 \
    --scale-down-delay-after-add=10s \
    --scale-down-delay-after-failure=10s \
    --scale-down-unneeded-time=0s \
    --skip-nodes-with-system-pods=true \
    --skip-nodes-with-local-storage=true \
    --logtostderr=true \
    --stderrthreshold=info \
    --leader-elect=false \
    --v=4 \
    --scan-interval=3s
scale this deployment to test scale up and down
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployments-simple-deployment-deployment
  namespace: default
spec:
  replicas: 0
  selector:
    matchLabels:
      app: deployments-simple-deployment-app
  template:
    metadata:
      labels:
        app: deployments-simple-deployment-app
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kwok-nodegroup
                operator: In
                values:
                - cluster-autoscaler
      containers:
      - command:
        - sleep
        - "3600"
        image: busybox
        imagePullPolicy: Always
        name: busybox
        resources:
          requests:
            cpu: "31"
      terminationGracePeriodSeconds: 0
      tolerations:
      - effect: NoSchedule
        key: kwok-provider
        operator: Equal
        value: "true"
CA log
I1202 22:49:14.738401 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:14.738533 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:14.738623 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:14.738654 1229586 filter_out_schedulable.go:63] Filtering out schedulables
I1202 22:49:14.738720 1229586 klogx.go:87] failed to find place for default/deployments-simple-deployment-deployment-59994f79f6-r85f5: cannot put pod deployments-simple-deployment-deployment-59994f79f6-r85f5 on any node
I1202 22:49:14.738732 1229586 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I1202 22:49:14.738740 1229586 filter_out_schedulable.go:83] No schedulable pods
I1202 22:49:14.738746 1229586 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I1202 22:49:14.738751 1229586 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 1 unschedulable pods left
I1202 22:49:14.738768 1229586 klogx.go:87] Pod default/deployments-simple-deployment-deployment-59994f79f6-r85f5 is unschedulable
I1202 22:49:14.738839 1229586 orchestrator.go:108] Upcoming 0 nodes
I1202 22:49:14.738847 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:14.739038 1229586 orchestrator.go:181] Best option to resize: cluster-autoscaler-1701528461
I1202 22:49:14.739047 1229586 orchestrator.go:185] Estimated 1 nodes needed in cluster-autoscaler-1701528461
I1202 22:49:14.739061 1229586 orchestrator.go:291] Final scale-up plan: [{cluster-autoscaler-1701528461 0->1 (max: 200)}]
I1202 22:49:14.739077 1229586 executor.go:147] Scale-up: setting group cluster-autoscaler-1701528461 size to 1
I1202 22:49:14.739181 1229586 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"default", Name:"cluster-autoscaler-status", UID:"095e2c8c-de6b-44b1-bda9-7174134f7a6e", APIVersion:"v1", ResourceVersion:"24451", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group cluster-autoscaler-1701528461 size to 1 instead of 0 (max: 200)
I1202 22:49:14.743388 1229586 eventing_scale_up_processor.go:47] Skipping event processing for unschedulable pods since there is a ScaleUp attempt this loop
I1202 22:49:14.743494 1229586 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"deployments-simple-deployment-deployment-59994f79f6-r85f5", UID:"fbfb034f-f5a8-42b8-9e81-411caaa49042", APIVersion:"v1", ResourceVersion:"24446", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up: [{cluster-autoscaler-1701528461 0->1 (max: 200)}]
I1202 22:49:17.749563 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:17.749713 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:17.749828 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:17.749844 1229586 clusterstate.go:260] Scale up in group cluster-autoscaler-1701528461 finished successfully in 3.006191994s
I1202 22:49:17.749875 1229586 filter_out_schedulable.go:63] Filtering out schedulables
I1202 22:49:17.749884 1229586 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I1202 22:49:17.749892 1229586 filter_out_schedulable.go:83] No schedulable pods
I1202 22:49:17.749901 1229586 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I1202 22:49:17.749905 1229586 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left
I1202 22:49:17.749912 1229586 static_autoscaler.go:547] No unschedulable pods
I1202 22:49:17.749918 1229586 static_autoscaler.go:570] Calculating unneeded nodes
I1202 22:49:17.749924 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:17.749930 1229586 pre_filtering_processor.go:57] Node minikube should not be processed by cluster autoscaler (no node group config)
I1202 22:49:17.749961 1229586 eligibility.go:162] Node cluster-autoscaler-1701528461-xghmv unremovable: cpu requested (96.875% of allocatable) is above the scale-down utilization threshold
I1202 22:49:17.749989 1229586 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2023-12-02 22:49:14.738369867 +0800 CST m=+97.234587115 lastScaleDownDeleteTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 lastScaleDownFailTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 scaleDownForbidden=false scaleDownInCooldown=true
I1202 22:49:20.754194 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:20.754386 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:20.754575 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:20.754637 1229586 filter_out_schedulable.go:63] Filtering out schedulables
I1202 22:49:20.754651 1229586 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I1202 22:49:20.754666 1229586 filter_out_schedulable.go:83] No schedulable pods
I1202 22:49:20.754674 1229586 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I1202 22:49:20.754683 1229586 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left
I1202 22:49:20.754698 1229586 static_autoscaler.go:547] No unschedulable pods
I1202 22:49:20.754709 1229586 static_autoscaler.go:570] Calculating unneeded nodes
I1202 22:49:20.754719 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:20.754727 1229586 pre_filtering_processor.go:57] Node minikube should not be processed by cluster autoscaler (no node group config)
I1202 22:49:20.754761 1229586 klogx.go:87] Node cluster-autoscaler-1701528461-xghmv - memory requested is 0% of allocatable
I1202 22:49:20.754785 1229586 cluster.go:156] Simulating node cluster-autoscaler-1701528461-xghmv removal
I1202 22:49:20.754805 1229586 cluster.go:179] node cluster-autoscaler-1701528461-xghmv may be removed
I1202 22:49:20.754817 1229586 nodes.go:84] cluster-autoscaler-1701528461-xghmv is unneeded since 2023-12-02 22:49:20.754082099 +0800 CST m=+103.250299367 duration 0s
I1202 22:49:20.754857 1229586 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2023-12-02 22:49:14.738369867 +0800 CST m=+97.234587115 lastScaleDownDeleteTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 lastScaleDownFailTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 scaleDownForbidden=false scaleDownInCooldown=true
I1202 22:49:23.760153 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:23.760340 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:23.760477 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:23.760546 1229586 filter_out_schedulable.go:63] Filtering out schedulables
I1202 22:49:23.760560 1229586 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I1202 22:49:23.760572 1229586 filter_out_schedulable.go:83] No schedulable pods
I1202 22:49:23.760580 1229586 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I1202 22:49:23.760587 1229586 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left
I1202 22:49:23.760600 1229586 static_autoscaler.go:547] No unschedulable pods
I1202 22:49:23.760611 1229586 static_autoscaler.go:570] Calculating unneeded nodes
I1202 22:49:23.760620 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:23.760633 1229586 pre_filtering_processor.go:57] Node minikube should not be processed by cluster autoscaler (no node group config)
I1202 22:49:23.760665 1229586 klogx.go:87] Node cluster-autoscaler-1701528461-xghmv - memory requested is 0% of allocatable
I1202 22:49:23.760690 1229586 cluster.go:156] Simulating node cluster-autoscaler-1701528461-xghmv removal
I1202 22:49:23.760709 1229586 cluster.go:179] node cluster-autoscaler-1701528461-xghmv may be removed
I1202 22:49:23.760722 1229586 nodes.go:84] cluster-autoscaler-1701528461-xghmv is unneeded since 2023-12-02 22:49:20.754082099 +0800 CST m=+103.250299367 duration 3.006038664s
I1202 22:49:23.760758 1229586 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2023-12-02 22:49:14.738369867 +0800 CST m=+97.234587115 lastScaleDownDeleteTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 lastScaleDownFailTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 scaleDownForbidden=false scaleDownInCooldown=true
I1202 22:49:26.766561 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:26.766740 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:26.766880 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:26.766936 1229586 filter_out_schedulable.go:63] Filtering out schedulables
I1202 22:49:26.766951 1229586 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I1202 22:49:26.766964 1229586 filter_out_schedulable.go:83] No schedulable pods
I1202 22:49:26.766972 1229586 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I1202 22:49:26.766980 1229586 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left
I1202 22:49:26.766992 1229586 static_autoscaler.go:547] No unschedulable pods
I1202 22:49:26.767003 1229586 static_autoscaler.go:570] Calculating unneeded nodes
I1202 22:49:26.767014 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:26.767022 1229586 pre_filtering_processor.go:57] Node minikube should not be processed by cluster autoscaler (no node group config)
I1202 22:49:26.767052 1229586 klogx.go:87] Node cluster-autoscaler-1701528461-xghmv - memory requested is 0% of allocatable
I1202 22:49:26.767073 1229586 cluster.go:156] Simulating node cluster-autoscaler-1701528461-xghmv removal
I1202 22:49:26.767093 1229586 cluster.go:179] node cluster-autoscaler-1701528461-xghmv may be removed
I1202 22:49:26.767106 1229586 nodes.go:84] cluster-autoscaler-1701528461-xghmv is unneeded since 2023-12-02 22:49:20.754082099 +0800 CST m=+103.250299367 duration 6.012451453s
I1202 22:49:26.767143 1229586 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2023-12-02 22:49:14.738369867 +0800 CST m=+97.234587115 lastScaleDownDeleteTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 lastScaleDownFailTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 scaleDownForbidden=false scaleDownInCooldown=false
I1202 22:49:26.767173 1229586 static_autoscaler.go:642] Starting scale down
I1202 22:49:26.767203 1229586 nodes.go:126] cluster-autoscaler-1701528461-xghmv was unneeded for 6.012451453s
I1202 22:49:26.767222 1229586 scale_down_set_processor.go:103] Considering node cluster-autoscaler-1701528461-xghmv for standard scale down
I1202 22:49:26.776287 1229586 taints.go:221] Successfully added ToBeDeletedTaint on node cluster-autoscaler-1701528461-xghmv
I1202 22:49:26.776372 1229586 actuator.go:143] Scale-down: removing empty node "cluster-autoscaler-1701528461-xghmv"
I1202 22:49:26.776470 1229586 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"cluster-autoscaler-1701528461-xghmv", UID:"88187c9f-7dc0-419b-9edb-47223f904a76", APIVersion:"v1", ResourceVersion:"24471", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' marked the node as toBeDeleted/unschedulable
I1202 22:49:26.776627 1229586 actuator.go:238] Scale-down: waiting 5s before trying to delete nodes
I1202 22:49:26.779502 1229586 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"default", Name:"cluster-autoscaler-status", UID:"095e2c8c-de6b-44b1-bda9-7174134f7a6e", APIVersion:"v1", ResourceVersion:"24498", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node "cluster-autoscaler-1701528461-xghmv"
I1202 22:49:29.781983 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:29.782130 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:29.782214 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:29.782247 1229586 filter_out_schedulable.go:63] Filtering out schedulables
I1202 22:49:29.782257 1229586 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I1202 22:49:29.782264 1229586 filter_out_schedulable.go:83] No schedulable pods
I1202 22:49:29.782269 1229586 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I1202 22:49:29.782274 1229586 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 0 unschedulable pods left
I1202 22:49:29.782281 1229586 static_autoscaler.go:547] No unschedulable pods
I1202 22:49:29.782287 1229586 static_autoscaler.go:570] Calculating unneeded nodes
I1202 22:49:29.782294 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:29.782299 1229586 pre_filtering_processor.go:57] Node minikube should not be processed by cluster autoscaler (no node group config)
I1202 22:49:29.782323 1229586 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2023-12-02 22:49:14.738369867 +0800 CST m=+97.234587115 lastScaleDownDeleteTime=2023-12-02 22:49:26.766533552 +0800 CST m=+109.262750820 lastScaleDownFailTime=2023-12-02 21:47:41.492276216 +0800 CST m=-3596.011506536 scaleDownForbidden=false scaleDownInCooldown=false
I1202 22:49:29.782346 1229586 static_autoscaler.go:642] Starting scale down
I1202 22:49:32.788976 1229586 static_autoscaler.go:290] Starting main loop
I1202 22:49:32.789123 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
I1202 22:49:32.789226 1229586 kwok_provider.go:58] ignoring node 'minikube' because it is not managed by kwok
F1202 22:49:32.789237 1229586 kwok_helpers.go:270] label 'kwok-nodegroup' for node 'kwok:cluster-autoscaler-1701528461-xghmv' not present in the manifest

Debugger finished with the exit code 0

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


qianlei90 avatar Dec 02 '23 14:12 qianlei90

/assign @vadasambar

qianlei90 avatar Dec 02 '23 14:12 qianlei90

Thank you for the PR!

vadasambar avatar Dec 04 '23 18:12 vadasambar

I can't reproduce the issue. I used the same commands and configmap you used.

Here's what I did:

kubectl scale deploy deployments-simple-deployment-deployment --replicas=2

kwok provider created 2 fake nodes.

And then

kubectl scale deploy deployments-simple-deployment-deployment --replicas=0

kwok provider scaled down the 2 fake nodes.

Logs for reference: https://gist.github.com/vadasambar/56ac07f2eedbd97e5d8aaa1424df3481

vadasambar avatar Dec 05 '23 17:12 vadasambar

Maybe you can share the error you saw?

vadasambar avatar Dec 05 '23 17:12 vadasambar

@vadasambar Sorry for the misunderstanding in the title. CA does not panic; it simply quits without any stack information. The latest line in your log shows the reason:

F1205 23:17:08.902773  116597 kwok_helpers.go:270] label 'kwok-nodegroup' for node 'kwok:cluster-autoscaler-1701798315-296wg' not present in the manifest

qianlei90 avatar Dec 06 '23 01:12 qianlei90

@vadasambar Sorry for the misunderstanding in the title. CA does not panic; it simply quits without any stack information. The latest line in your log shows the reason:

F1205 23:17:08.902773  116597 kwok_helpers.go:270] label 'kwok-nodegroup' for node 'kwok:cluster-autoscaler-1701798315-296wg' not present in the manifest

This is clearly a bug. Thank you for the explanation!

vadasambar avatar Dec 11 '23 18:12 vadasambar

Also, I think we should add a test case which fails with the current code and passes with the fix.

Thanks for you advise, it will be done in a few days.

/hold

qianlei90 avatar Dec 19 '23 11:12 qianlei90

Also, I think we should add a test case which fails with the current code and passes with the fix.

Done.

/unhold

qianlei90 avatar Dec 29 '23 01:12 qianlei90

@qianlei90 apologies for the delay (was out on vacation). I plan to review this PR this week.

vadasambar avatar Jan 03 '24 04:01 vadasambar

/lgtm

vadasambar avatar Jan 12 '24 16:01 vadasambar

/unhold

vadasambar avatar Jan 12 '24 16:01 vadasambar

Thank you @qianlei90 . LGTM.

@BigDarkClown can you please merge the PR :pray: I am the approver and reviewer for kwok cloud provider (this PR contains only kwok provider changes) but I can't seem to merge the PR.

vadasambar avatar Jan 12 '24 16:01 vadasambar

/assign @towca

vadasambar avatar Jan 12 '24 17:01 vadasambar

/approve

towca avatar Jan 18 '24 13:01 towca

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qianlei90, towca, vadasambar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jan 18 '24 13:01 k8s-ci-robot