katalyst-core icon indicating copy to clipboard operation
katalyst-core copied to clipboard

节点的动态超分比例在增加CPU消耗后,不降反升

Open flpanbin opened this issue 8 months ago • 11 comments

What happened?

我按照 动态超分的文档体验了下动态超分功能,但是在创建 testpod1 增加 cpu的消耗后,cpu的超分比 cpu_overcommit_ratio 不降反升。

没有pod运行时,查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.74
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.15
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135351666
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>

创建 testpod1 后,再次查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.99
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.41
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135554723
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>


[root@g-master1 katalyst]# kubectl get pod -n katalyst-system
NAME                                            READY   STATUS    RESTARTS       AGE
katalyst-controller-747545d674-54d2j            1/1     Running   9 (14h ago)    6d19h
katalyst-webhook-69bdb7d5d6-jnrh5               1/1     Running   0              6d19h
overcommit-katalyst-agent-l2rdx                 1/1     Running   0              6d19h
overcommit-katalyst-agent-sb2bd                 1/1     Running   0              6d19h
overcommit-katalyst-agent-vb5wc                 1/1     Running   0              6d19h
overcommit-katalyst-scheduler-58f64f644-442lb   1/1     Running   16 (14h ago)   6d19h
testpod1                                        1/1     Running   0              12s

katalyst 版本:

panbin@panbindeMacBook-Pro ~ % helm list -n katalyst-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/panbin/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/panbin/.kube/config
NAME      	NAMESPACE      	REVISION	UPDATED                             	STATUS  	CHART                    	APP VERSION
overcommit	katalyst-system	1       	2024-05-27 22:01:28.110633 +0800 CST	deployed	katalyst-overcommit-0.5.0	v0.5.0

What did you expect to happen?

创建 testpod1 后, 对应节点的 cpu 超分比 katalyst.kubewharf.io/cpu_overcommit_ratio 降低。

How can we reproduce it (as minimally and precisely as possible)?

按照这个文档操作即可:https://gokatalyst.io/docs/user-guide/resource-overcommitment/dynamic-overcommitment/

Software version

$ <software> version
# paste output here

flpanbin avatar Jun 03 '24 10:06 flpanbin