crane-scheduler
crane-scheduler copied to clipboard
自建Prometheus获取不到聚合指标
1、看crane-scheduler-controller日志发现聚合指标的监控项指标都获取不到
W0626 20:55:02.198329 1 node.go:61] failed to sync this node ["k8s-node4/mem_usage_avg_5m"]: can not annotate node[k8s-node4]: failed to get data mem_usage_avg_5m{k8s-node4=}:
@Quintonwong
First, check if aggregated metrics data can be pulled inside the container:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'
Then, check non-aggregated metrics data:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'
If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration
@Quintonwong
First, check if aggregated metrics data can be pulled inside the container:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'
Then, check non-aggregated metrics data:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query'
If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration
output error curl -g 'http://x.x.x.x:9090/api/v1/query' {"status":"error","errorType":"bad_data","error":"invalid parameter 'query': parse error at char 1: no expression found in input"}
curl -g 'http://x.x.x.x:9090/api/v1/query'
I made a mistake, the command should be
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'
curl -g 'http://x.x.x.x:9090/api/v1/query'
I made a mistake, the command should be
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'
Return Success
I think you can increase second intervals of cpu_usage_active
.
I have same Problem。kubernetes version:1.23.10,crane version: v0.5.1,crane-scheduler-controller:v0.1.23.
I have checked the aggregated metrics data and non-aggregated metrics data, both can be obtained, and the modification interval of cpu_usage_active is 5s, but I still cannot obtain the data and annotate Node.
W0319 15:26:24.293385 1 node.go:61] failed to sync this node ["kse2/cpu_usage_avg_5m"]: can not annotate node[kse2]: failed to get data cpu_usage_avg_5m{kse2=}:
Could you help me @xieydd ,Thanks very much.
@Quintonwong
首先,检查是否可以将聚合的指标数据拉入容器:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'
然后,检查非聚合指标数据:
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'
如果非聚合指标数据正常,但无法拉取非聚合指标数据,则表明普罗米修斯规则没有生效,请参考https://普罗米修斯. io/docs/普罗米修斯/最新/配置/配置
你好,我也是遇到这个问题,进入到crane-scheduler-controller容器,可以获取到聚合数据,但是crane-scheduler-controller容器日志一直提示错误:I0330 13:18:01.658598 1 node.go:75] Finished syncing node event "cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m" (35.978µs)
W0330 13:18:01.658604 1 node.go:61] failed to sync this node ["cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m"]: can not annotate node[cn-hangzhou.i-bp19r762s7xryoo6fjmx]: failed to get data mem_usage_avg_5m{cn-hangzhou.i-bp19r762s7xryoo6fjmx=}: Post "10.7.1.60/api/v1/query": unsupported protocol scheme ""
升级promeetheus和node-exporter至最新版本试下
@sdnmw 取不到值的原因是,crane会把nodename 转换为节点ip,用节点ip作为instance标签的值去Prometheus去查询的。
出现这种情况,应该是在K8S中部署的node_exporter,可以在Prometheus中抓取node-exporter加上标签的重置
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: instance
action: replace
- source_labels: [__meta_kubernetes_node_address_Hostname]
target_label: instance_name
action: replace