n9e-helm
n9e-helm copied to clipboard
【配置建议】Categraf采集k8s信息,配置项优化
部署后,注意到k8s采集信息失败,排查categraf日志发现, input.kubernetes等k8s信息采集,配置中的kubelet配置为127.0.0.1,无法访问;
2024/12/17 02:52:55 instances.go:227: E! failed to query url: https://127.0.0.1:10250/metrics/cadvisor error: Get "https://127.0.0.1:10250/metrics/cadvisor": dial tcp 127.0.0.1:10250: connect: connection refused
2024/12/17 02:52:55 prometheus.go:227: E! failed to query url: https://127.0.0.1:10250/metrics error: Get "https://127.0.0.1:10250/metrics": dial tcp 127.0.0.1:10250: connect: connection refused
2024/12/17 02:53:10 kubernetes.go:113: E! failed to load https://127.0.0.1:10250/stats/summary error: error making HTTP request to https://127.0.0.1:10250/stats/summary: dial tcp 127.0.0.1:10250: connect: connection refused
集群kubelet不允许127.0.0.1访问,
[root@xcmgt01 n9e-helm]# netstat -tunlp |grep 10250
tcp 0 0 172.30.31.15:10250 0.0.0.0:* LISTEN 1809/kubelet
[root@xcmgt01 n9e-helm]# curl https://127.0.0.1:10250
curl: (7) Failed to connect to 127.0.0.1 port 10250: 拒绝连接
[root@xcmgt01 n9e-helm]# curl https://172.30.31.15:10250
404 page not found
[root@xcmgt01 n9e-helm]# curl https://127.0.0.1:10250/metrics/cadvisor
curl: (7) Failed to connect to 127.0.0.1 port 10250: 拒绝连接
[root@xcmgt01 n9e-helm]# curl https://172.30.31.15:10250/metrics/cadvisor
建议添加注释,或优化为以下配置,提供遇到同样问题时的解决方案 ${HOSTIP}为Categraf的pod自带的环境变量,完美解决了我的问题
# URL for the kubelet
url = "https://${HOSTIP}:10250"
[root@xcmgt01 n9e-helm]# kubectl exec -it nightingale-categraf-v6-wkgj9 -n n9e -- printenv | grep HOSTIP
HOSTIP=172.30.31.46
[root@xcmgt01 n9e-helm]# kubectl exec -it nightingale-categraf-v6-wkgj9 -n n9e -- printenv | grep 172.30
HOSTIP=172.30.31.46
感谢反馈,HOSTIP确实就是这么用的 ,有些公有云不允许直接用127访问。 https://github.com/flashcatcloud/categraf/blob/main/k8s/daemonset.yaml#L527
@sunFlying 遇到同样的问题了,请问url = "https://${HOSTIP}:10250"这个赢加到categraf.internal中什么位置呢
加了没生效,估计没加对,还是报127的IP拒绝连接
categraf:
type: internal
internal:
serviceAccountName: ""
automountServiceAccountToken: true
image:
repository: flashcatcloud/categraf
tag: latest
resources:
requests:
memory: 64Mi
cpu: 50m
limits:
memory: 64Mi
cpu: 50m
extraConfig:
kubernetes.toml: |
[inputs.kubernetes]
interval = "15s"
kubelet_endpoint = "https://{HOSTIP}:10250"
bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
insecure_skip_verify = true
[instances]
url = "https://${HOSTIP}:10250/metrics"
bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
tls_ca = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
insecure_skip_verify = true
interval = "15s"
@sunFlying @kongfei605
加到categraf的env中