crane-scheduler 自建Prometheus获取不到聚合指标

1、看crane-scheduler-controller日志发现聚合指标的监控项指标都获取不到 W0626 20:55:02.198329 1 node.go:61] failed to sync this node ["k8s-node4/mem_usage_avg_5m"]: can not annotate node[k8s-node4]: failed to get data mem_usage_avg_5m{k8s-node4=}: 2、 fe3d166c668c1cc8739fbaf5d2ce873

Jun 26 '22 12:06 Quintonwong

@Quintonwong

First, check if aggregated metrics data can be pulled inside the container:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

Then, check non-aggregated metrics data：

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration

Jun 26 '22 14:06 autumn0207

@Quintonwong

First, check if aggregated metrics data can be pulled inside the container:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'
Then, check non-aggregated metrics data：
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query'
If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration

output error curl -g 'http://x.x.x.x:9090/api/v1/query' {"status":"error","errorType":"bad_data","error":"invalid parameter 'query': parse error at char 1: no expression found in input"}

Jun 27 '22 00:06 ArvinChen1991

curl -g 'http://x.x.x.x:9090/api/v1/query'

I made a mistake, the command should be

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

Jun 29 '22 04:06 autumn0207

curl -g 'http://x.x.x.x:9090/api/v1/query'

I made a mistake, the command should be
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

Return Success

Jul 01 '22 01:07 ArvinChen1991

I think you can increase second intervals of cpu_usage_active.

Dec 09 '22 03:12 xieydd

I have same Problem。kubernetes version:1.23.10，crane version: v0.5.1,crane-scheduler-controller:v0.1.23.

I have checked the aggregated metrics data and non-aggregated metrics data, both can be obtained, and the modification interval of cpu_usage_active is 5s, but I still cannot obtain the data and annotate Node.

W0319 15:26:24.293385 1 node.go:61] failed to sync this node ["kse2/cpu_usage_avg_5m"]: can not annotate node[kse2]: failed to get data cpu_usage_avg_5m{kse2=}: I0319 15:26:24.295764 1 node.go:75] Finished syncing node event "kse3/cpu_usage_avg_5m" (2.357063ms) W0319 15:26:24.295781 1 node.go:61] failed to sync this node ["kse3/cpu_usage_avg_5m"]: can not annotate node[kse3]: failed to get data cpu_usage_avg_5m{kse3=}: I0319 15:26:24.298258 1 node.go:75] Finished syncing node event "kse4/cpu_usage_avg_5m" (2.454873ms) W0319 15:26:24.298279 1 node.go:61] failed to sync this node ["kse4/cpu_usage_avg_5m"]: can not annotate node[kse4]: failed to get data cpu_usage_avg_5m{kse4=}:

Could you help me @xieydd ，Thanks very much.

Mar 19 '23 07:03 sdnmw

@Quintonwong

首先，检查是否可以将聚合的指标数据拉入容器:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'
然后，检查非聚合指标数据:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'
如果非聚合指标数据正常，但无法拉取非聚合指标数据，则表明普罗米修斯规则没有生效，请参考https://普罗米修斯. io/docs/普罗米修斯/最新/配置/配置

你好，我也是遇到这个问题，进入到crane-scheduler-controller容器，可以获取到聚合数据，但是crane-scheduler-controller容器日志一直提示错误：I0330 13:18:01.658598 1 node.go:75] Finished syncing node event "cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m" (35.978µs) W0330 13:18:01.658604 1 node.go:61] failed to sync this node ["cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m"]: can not annotate node[cn-hangzhou.i-bp19r762s7xryoo6fjmx]: failed to get data mem_usage_avg_5m{cn-hangzhou.i-bp19r762s7xryoo6fjmx=}: Post "10.7.1.60/api/v1/query": unsupported protocol scheme "" Uploading 1680153559500.jpg…

Mar 30 '23 05:03 nailianglu

升级promeetheus和node-exporter至最新版本试下

Dec 19 '23 08:12 wyaopeng

@sdnmw 取不到值的原因是，crane会把nodename 转换为节点ip，用节点ip作为instance标签的值去Prometheus去查询的。出现这种情况，应该是在K8S中部署的node_exporter，可以在Prometheus中抓取node-exporter加上标签的重置 - source_labels: [__meta_kubernetes_node_address_InternalIP] target_label: instance action: replace - source_labels: [__meta_kubernetes_node_address_Hostname] target_label: instance_name action: replace

Apr 25 '24 07:04 niyang110

crane-scheduler crane-scheduler copied to clipboard

自建Prometheus获取不到聚合指标

crane-scheduler
crane-scheduler copied to clipboard