crane-scheduler icon indicating copy to clipboard operation
crane-scheduler copied to clipboard

helm安装Crane-scheduler 作为第二个调度器,使用官网示例测试pod没有被调度,一直卡在”Pending“状态

Open xucq07 opened this issue 1 year ago • 13 comments

helm安装Crane-scheduler 作为第二个调度器,使用官网示例测试pod没有被调度,一直卡在”Pending“状态: 1、部署yaml: apiVersion: apps/v1 kind: Deployment metadata: name: cpu-stress spec: selector: matchLabels: app: cpu-stress replicas: 1 template: metadata: labels: app: cpu-stress spec: schedulerName: crane-scheduler hostNetwork: true tolerations: - key: node.kubernetes.io/network-unavailable operator: Exists effect: NoSchedule containers: - name: stress image: docker.io/gocrane/stress:latest command: ["stress", "-c", "1"] resources: requests: memory: "1Gi" cpu: "1" limits: memory: "1Gi" cpu: "1" 2、pod详情: Name: cpu-stress-cc8656b6c-b5hhz Namespace: default Priority: 0 Node: Labels: app=cpu-stress pod-template-hash=cc8656b6c Annotations: Status: Pending IP: IPs: Controlled By: ReplicaSet/cpu-stress-cc8656b6c Containers: stress: Image: docker.io/gocrane/stress:latest Port: Host Port: Command: stress -c 1 Limits: cpu: 1 memory: 1Gi Requests: cpu: 1 memory: 1Gi Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9nwd5 (ro) Volumes: kube-api-access-9nwd5: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Guaranteed Node-Selectors: Tolerations: node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: 3、crane-scheduler日志: I0824 00:50:47.247851 1 serving.go:331] Generated self-signed cert in-memory W0824 00:50:48.025758 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. W0824 00:50:48.073470 1 authorization.go:47] Authorization is disabled W0824 00:50:48.073495 1 authentication.go:40] Authentication is disabled I0824 00:50:48.073517 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 I0824 00:50:48.080823 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0824 00:50:48.080862 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I0824 00:50:48.080915 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0824 00:50:48.080927 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0824 00:50:48.080957 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0824 00:50:48.080968 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0824 00:50:48.081199 1 secure_serving.go:197] Serving securely on [::]:10259 I0824 00:50:48.081270 1 tlsconfig.go:240] Starting DynamicServingCertificateController W0824 00:50:48.091287 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0824 00:50:48.146624 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget I0824 00:50:48.182865 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I0824 00:50:48.183903 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0824 00:50:48.184059 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0824 00:50:48.284088 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler... W0824 00:57:30.128689 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0824 01:02:45.130884 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0824 01:08:48.133483 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0824 01:14:31.135801 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0824 01:20:24.138959 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0824 01:30:10.141873 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget 4、crane-scheduler-controlle日志: I0824 08:46:16.647776 1 server.go:61] Starting Controller version v0.0.0-master+$Format:%H$ I0824 08:46:16.648237 1 leaderelection.go:248] attempting to acquire leader lease crane-system/crane-scheduler-controller... I0824 08:46:16.706891 1 leaderelection.go:258] successfully acquired lease crane-system/crane-scheduler-controller I0824 08:46:16.807546 1 controller.go:72] Caches are synced for controller I0824 08:46:16.807631 1 node.go:46] Start to reconcile node events I0824 08:46:16.807653 1 event.go:30] Start to reconcile EVENT events I0824 08:46:16.885698 1 node.go:75] Finished syncing node event "node6/cpu_usage_avg_5m" (77.952416ms) I0824 08:46:16.973162 1 node.go:75] Finished syncing node event "node4/cpu_usage_avg_5m" (87.371252ms) I0824 08:46:17.045250 1 node.go:75] Finished syncing node event "master2/cpu_usage_avg_5m" (72.023298ms) I0824 08:46:17.109260 1 node.go:75] Finished syncing node event "master3/cpu_usage_avg_5m" (63.673389ms) I0824 08:46:17.192332 1 node.go:75] Finished syncing node event "node1/cpu_usage_avg_5m" (83.005155ms) I0824 08:46:17.529495 1 node.go:75] Finished syncing node event "node2/cpu_usage_avg_5m" (337.099052ms) I0824 08:46:17.927163 1 node.go:75] Finished syncing node event "node3/cpu_usage_avg_5m" (397.603044ms) I0824 08:46:18.327978 1 node.go:75] Finished syncing node event "node5/cpu_usage_avg_5m" (400.749476ms) I0824 08:46:18.746391 1 node.go:75] Finished syncing node event "master1/cpu_usage_avg_5m" (418.360885ms) I0824 08:46:19.129081 1 node.go:75] Finished syncing node event "node6/cpu_usage_max_avg_1h" (382.635495ms) I0824 08:46:19.524508 1 node.go:75] Finished syncing node event "node4/cpu_usage_max_avg_1h" (395.361539ms) I0824 08:46:19.948035 1 node.go:75] Finished syncing node event "master2/cpu_usage_max_avg_1h" (423.453672ms) I0824 08:46:20.332014 1 node.go:75] Finished syncing node event "master3/cpu_usage_max_avg_1h" (383.909395ms) I0824 08:46:20.737296 1 node.go:75] Finished syncing node event "node1/cpu_usage_max_avg_1h" (405.102002ms) I0824 08:46:21.245055 1 node.go:75] Finished syncing node event "node2/cpu_usage_max_avg_1h" (507.697871ms) I0824 08:46:21.573490 1 node.go:75] Finished syncing node event "node3/cpu_usage_max_avg_1h" (328.368489ms) I0824 08:46:21.937814 1 node.go:75] Finished syncing node event "node5/cpu_usage_max_avg_1h" (364.254837ms) I0824 08:46:22.335988 1 node.go:75] Finished syncing node event "master1/cpu_usage_max_avg_1h" (397.952357ms) I0824 08:46:22.724851 1 node.go:75] Finished syncing node event "master2/cpu_usage_max_avg_1d" (388.771915ms) I0824 08:46:23.126059 1 node.go:75] Finished syncing node event "master3/cpu_usage_max_avg_1d" (401.156708ms) I0824 08:46:23.528329 1 node.go:75] Finished syncing node event "node6/cpu_usage_max_avg_1d" (402.208827ms) I0824 08:46:23.937560 1 node.go:75] Finished syncing node event "node4/cpu_usage_max_avg_1d" (409.165081ms) I0824 08:46:24.331730 1 node.go:75] Finished syncing node event "node5/cpu_usage_max_avg_1d" (394.024206ms) I0824 08:46:24.730137 1 node.go:75] Finished syncing node event "master1/cpu_usage_max_avg_1d" (398.33551ms) I0824 08:46:25.127074 1 node.go:75] Finished syncing node event "node1/cpu_usage_max_avg_1d" (396.798913ms) I0824 08:46:25.528844 1 node.go:75] Finished syncing node event "node2/cpu_usage_max_avg_1d" (401.701104ms) I0824 08:46:25.932684 1 node.go:75] Finished syncing node event "node3/cpu_usage_max_avg_1d" (403.762529ms) I0824 08:46:26.330458 1 node.go:75] Finished syncing node event "node4/mem_usage_avg_5m" (397.710372ms) I0824 08:46:26.736576 1 node.go:75] Finished syncing node event "master2/mem_usage_avg_5m" (406.060927ms)

xucq07 avatar Aug 24 '23 01:08 xucq07

请检查下crane-scheduler的pod的状态,是否running

qmhu avatar Aug 28 '23 02:08 qmhu

kubectl get pods -n crane-system NAME READY STATUS RESTARTS AGE crane-scheduler-b84489958-6jdj6 1/1 Running 0 4d1h crane-scheduler-controller-6987688d8d-6wr7c 1/1 Running 0 4d1h 再次确认pod已经Running

xucq07 avatar Aug 28 '23 02:08 xucq07

kubectl get pods -n crane-system NAME READY STATUS RESTARTS AGE crane-scheduler-b84489958-6jdj6 1/1 Running 0 4d1h crane-scheduler-controller-6987688d8d-6wr7c 1/1 Running 0 4d1h 再次确认pod已经Running

从日志没看到异常。 你可以把pod的defaultScheduler改成空,试试默认调度器是否可以工作。

qmhu avatar Aug 28 '23 07:08 qmhu

测试过了,默认调度器没有问题可以正常调度

xucq07 avatar Aug 28 '23 07:08 xucq07

测试过了,默认调度器没有问题可以正常调度

能否把完整的日志发上来,包括crane-scheduler-controller-6987688d8d-6wr7c和crane-scheduler-b84489958-6jdj6

qmhu avatar Aug 28 '23 07:08 qmhu

crane-scheduler.log crane-scheduler-controller.log 日志信息如下

xucq07 avatar Aug 28 '23 07:08 xucq07

遇到了同样的问题,使用的k8s版本为1.27 scheduler中报错如下: 0905 05:42:20.346742 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:01.852683 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:01.852729 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:34.262887 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:34.262932 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:44:33.675140 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:44:33.675182 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:20.214073 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:20.214163 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:56.034526 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:56.034592 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:46:48.730711 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:46:48.730757 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:47:24.823783 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:47:24.823828 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

请大佬帮忙指点一下吧,感谢

mobeixiaoxin avatar Sep 05 '23 05:09 mobeixiaoxin

遇到了同样的问题,使用的k8s版本为1.27 scheduler中报错如下: 0905 05:42:20.346742 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:01.852683 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:01.852729 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:34.262887 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:34.262932 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:44:33.675140 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:44:33.675182 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:20.214073 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:20.214163 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:56.034526 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:56.034592 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:46:48.730711 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:46:48.730757 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:47:24.823783 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:47:24.823828 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

请大佬帮忙指点一下吧,感谢

应该是高版本的兼容性问题,目前1.25以下的集群没有问题,更高的集群可能要额外支持。

qmhu avatar Sep 06 '23 07:09 qmhu

遇到了同样的问题,使用的k8s版本为1.27 scheduler中报错如下: 0905 05:42:20.346742 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:01.852683 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:01.852729 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:34.262887 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:34.262932 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:44:33.675140 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:44:33.675182 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:20.214073 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:20.214163 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:56.034526 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:56.034592 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:46:48.730711 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:46:48.730757 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:47:24.823783 1 reflector.go:324] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:47:24.823828 1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource 请大佬帮忙指点一下吧,感谢

应该是高版本的兼容性问题,目前1.25以下的集群没有问题,更高的集群可能要额外支持。

好的,感谢

mobeixiaoxin avatar Sep 06 '23 07:09 mobeixiaoxin

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态

crane-scheduler日志: I1018 14:19:17.775925 1 serving.go:331] Generated self-signed cert in-memory W1018 14:19:18.105223 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. W1018 14:19:18.116946 1 authorization.go:47] Authorization is disabled W1018 14:19:18.116959 1 authentication.go:40] Authentication is disabled I1018 14:19:18.116979 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 I1018 14:19:18.119411 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I1018 14:19:18.119430 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I1018 14:19:18.119461 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119469 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119489 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119498 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119562 1 secure_serving.go:197] Serving securely on [::]:10259 I1018 14:19:18.119635 1 tlsconfig.go:240] Starting DynamicServingCertificateController I1018 14:19:18.219523 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.219544 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.219982 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I1018 14:19:18.320414 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...

crane-scheduler-controller日志: I1018 22:16:26.114291 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (277.013756ms) I1018 22:19:25.740401 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (34.500361ms) I1018 22:19:25.764618 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (24.178999ms) I1018 22:19:25.798566 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (33.90647ms) I1018 22:19:25.826773 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (28.169613ms) I1018 22:19:25.848814 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (22.005738ms) I1018 22:19:26.117118 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (268.264709ms) I1018 22:22:25.737763 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (32.338992ms) I1018 22:22:25.765262 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (27.45828ms) I1018 22:22:25.794327 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (29.029129ms) I1018 22:22:25.818029 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (23.666818ms) I1018 22:22:25.841672 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (23.603915ms) I1018 22:22:26.125154 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (283.438566ms)

redtee123 avatar Oct 18 '23 14:10 redtee123

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态

crane-scheduler日志: I1018 14:19:17.775925 1 serving.go:331] Generated self-signed cert in-memory W1018 14:19:18.105223 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. W1018 14:19:18.116946 1 authorization.go:47] Authorization is disabled W1018 14:19:18.116959 1 authentication.go:40] Authentication is disabled I1018 14:19:18.116979 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 I1018 14:19:18.119411 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I1018 14:19:18.119430 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I1018 14:19:18.119461 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119469 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119489 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119498 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119562 1 secure_serving.go:197] Serving securely on [::]:10259 I1018 14:19:18.119635 1 tlsconfig.go:240] Starting DynamicServingCertificateController I1018 14:19:18.219523 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.219544 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.219982 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I1018 14:19:18.320414 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...

crane-scheduler-controller日志: I1018 22:16:26.114291 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (277.013756ms) I1018 22:19:25.740401 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (34.500361ms) I1018 22:19:25.764618 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (24.178999ms) I1018 22:19:25.798566 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (33.90647ms) I1018 22:19:25.826773 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (28.169613ms) I1018 22:19:25.848814 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (22.005738ms) I1018 22:19:26.117118 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (268.264709ms) I1018 22:22:25.737763 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (32.338992ms) I1018 22:22:25.765262 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (27.45828ms) I1018 22:22:25.794327 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (29.029129ms) I1018 22:22:25.818029 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (23.666818ms) I1018 22:22:25.841672 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (23.603915ms) I1018 22:22:26.125154 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (283.438566ms)

可能是没有关闭第二调度器的leaderelection。 helm/chart中安装的scheduler关闭了leaderelection,可以参考下: https://github.com/gocrane/helm-charts/blob/main/charts/scheduler/templates/scheduler-deployment.yaml#L23

qmhu avatar Oct 19 '23 03:10 qmhu

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态 crane-scheduler日志: I1018 14:19:17.775925 1 serving.go:331] Generated self-signed cert in-memory W1018 14:19:18.105223 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. W1018 14:19:18.116946 1 authorization.go:47] Authorization is disabled W1018 14:19:18.116959 1 authentication.go:40] Authentication is disabled I1018 14:19:18.116979 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 I1018 14:19:18.119411 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I1018 14:19:18.119430 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I1018 14:19:18.119461 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119469 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119489 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119498 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119562 1 secure_serving.go:197] Serving securely on [::]:10259 I1018 14:19:18.119635 1 tlsconfig.go:240] Starting DynamicServingCertificateController I1018 14:19:18.219523 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.219544 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.219982 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I1018 14:19:18.320414 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler... crane-scheduler-controller日志: I1018 22:16:26.114291 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (277.013756ms) I1018 22:19:25.740401 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (34.500361ms) I1018 22:19:25.764618 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (24.178999ms) I1018 22:19:25.798566 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (33.90647ms) I1018 22:19:25.826773 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (28.169613ms) I1018 22:19:25.848814 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (22.005738ms) I1018 22:19:26.117118 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (268.264709ms) I1018 22:22:25.737763 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (32.338992ms) I1018 22:22:25.765262 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (27.45828ms) I1018 22:22:25.794327 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (29.029129ms) I1018 22:22:25.818029 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (23.666818ms) I1018 22:22:25.841672 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (23.603915ms) I1018 22:22:26.125154 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (283.438566ms)

可能是没有关闭第二调度器的leaderelection。 helm/chart中安装的scheduler关闭了leaderelection,可以参考下: https://github.com/gocrane/helm-charts/blob/main/charts/scheduler/templates/scheduler-deployment.yaml#L23

确实是第二调度器没有关闭leaderelection导致的。但不是scheduler-deployment.yaml中的leaderelection,是scheduler-configmap.yaml中的leaderelection没关闭 6861013c12857c5bd3823fe3add269ab

redtee123 avatar Oct 19 '23 04:10 redtee123

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态

可能是没有关闭第二调度器的leaderelection。 helm/chart中安装的scheduler关闭了leaderelection,可以参考下: https://github.com/gocrane/helm-charts/blob/main/charts/scheduler/templates/scheduler-deployment.yaml#L23

确实是第二调度器没有关闭leaderelection导致的。但不是scheduler-deployment.yaml中的leaderelection,是scheduler-configmap.yaml中的leaderelection没关闭 6861013c12857c5bd3823fe3add269ab

我的kubernetes版本为1.22.12,使用的crane-scheduler镜像版本为scheduler-0.2.2,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。也将leaderelection改为false了,但是当我创建新的pod测试调度时,pod一直处于pending状态。 pod信息:

Events:
  Type     Reason            Age   From             Message
  ----     ------            ----  ----             -------
  Warning  FailedScheduling  15s   crane-scheduler  0/1 nodes are available: 1 Insufficient cpu.

leaderelection:

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
    - schedulerName: crane-scheduler
      plugins:
        filter:
          enabled:
          - name: Dynamic
        score:
          enabled:
          - name: Dynamic
            weight: 3

crane-scheduler日志:

I1226 09:47:56.595597       1 serving.go:348] Generated self-signed cert in-memory
W1226 09:47:57.035592       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1226 09:47:57.041561       1 server.go:139] "Starting Kubernetes Scheduler" version="v0.0.0-master+$Format:%H$"
I1226 09:47:57.044642       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1226 09:47:57.044658       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1226 09:47:57.044666       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1226 09:47:57.044679       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1226 09:47:57.044699       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1226 09:47:57.044715       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1226 09:47:57.045160       1 secure_serving.go:200] Serving securely on [::]:10259
I1226 09:47:57.045218       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I1226 09:47:57.145093       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I1226 09:47:57.145152       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I1226 09:47:57.145100       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

crane-scheduler-controller日志:

root@master:/home/ubuntu/kube-prometheus/manifests# kubectl logs -n crane-system crane-scheduler-controller-6f6b94c8f7-79vff 
I1226 17:47:56.187263       1 server.go:61] Starting Controller version v0.0.0-master+$Format:%H$
I1226 17:47:56.188316       1 leaderelection.go:248] attempting to acquire leader lease crane-system/crane-scheduler-controller...
I1226 17:48:12.646241       1 leaderelection.go:258] successfully acquired lease crane-system/crane-scheduler-controller
I1226 17:48:12.747072       1 controller.go:72] Caches are synced for controller
I1226 17:48:12.747174       1 node.go:46] Start to reconcile node events
I1226 17:48:12.747208       1 event.go:30] Start to reconcile EVENT events
I1226 17:48:12.773420       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (26.154965ms)
I1226 17:48:12.794854       1 node.go:75] Finished syncing node event "master/cpu_usage_max_avg_1h" (21.278461ms)
I1226 17:48:12.818035       1 node.go:75] Finished syncing node event "master/cpu_usage_max_avg_1d" (23.146517ms)
I1226 17:48:12.837222       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (19.151134ms)
I1226 17:48:13.055018       1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1h" (217.762678ms)
I1226 17:48:13.455442       1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1d" (400.366453ms)
I1226 17:51:12.788539       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (41.092765ms)
I1226 17:51:12.810824       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (22.248821ms)
I1226 17:54:12.771140       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (22.840662ms)
I1226 17:54:12.789918       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (18.740179ms)
I1226 17:57:12.773735       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (26.395777ms)
I1226 17:57:12.792897       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (19.124323ms)
I1226 18:00:12.772243       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (24.369461ms)
I1226 18:00:12.804297       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (32.008004ms)
I1226 18:03:12.774690       1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1h" (27.291591ms)
I1226 18:03:12.795145       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (20.350165ms)
I1226 18:03:12.813508       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (18.32638ms)
I1226 18:03:12.833109       1 node.go:75] Finished syncing node event "master/cpu_usage_max_avg_1h" (19.549029ms)

lesserror avatar Dec 26 '23 10:12 lesserror