n9e-helm icon indicating copy to clipboard operation
n9e-helm copied to clipboard

【配置建议】Categraf采集k8s信息,配置项优化

Open sunFlying opened this issue 1 year ago • 4 comments

部署后,注意到k8s采集信息失败,排查categraf日志发现, input.kubernetes等k8s信息采集,配置中的kubelet配置为127.0.0.1,无法访问;

2024/12/17 02:52:55 instances.go:227: E! failed to query url: https://127.0.0.1:10250/metrics/cadvisor error: Get "https://127.0.0.1:10250/metrics/cadvisor": dial tcp 127.0.0.1:10250: connect: connection refused
2024/12/17 02:52:55 prometheus.go:227: E! failed to query url: https://127.0.0.1:10250/metrics error: Get "https://127.0.0.1:10250/metrics": dial tcp 127.0.0.1:10250: connect: connection refused
2024/12/17 02:53:10 kubernetes.go:113: E! failed to load https://127.0.0.1:10250/stats/summary error: error making HTTP request to https://127.0.0.1:10250/stats/summary: dial tcp 127.0.0.1:10250: connect: connection refused

集群kubelet不允许127.0.0.1访问,

[root@xcmgt01 n9e-helm]# netstat -tunlp |grep 10250
tcp        0      0 172.30.31.15:10250      0.0.0.0:*               LISTEN      1809/kubelet
[root@xcmgt01 n9e-helm]# curl https://127.0.0.1:10250
curl: (7) Failed to connect to 127.0.0.1 port 10250: 拒绝连接
[root@xcmgt01 n9e-helm]# curl https://172.30.31.15:10250
404 page not found

[root@xcmgt01 n9e-helm]# curl https://127.0.0.1:10250/metrics/cadvisor
curl: (7) Failed to connect to 127.0.0.1 port 10250: 拒绝连接
[root@xcmgt01 n9e-helm]# curl https://172.30.31.15:10250/metrics/cadvisor

建议添加注释,或优化为以下配置,提供遇到同样问题时的解决方案 ${HOSTIP}为Categraf的pod自带的环境变量,完美解决了我的问题

# URL for the kubelet
url = "https://${HOSTIP}:10250"

[root@xcmgt01 n9e-helm]# kubectl exec -it  nightingale-categraf-v6-wkgj9 -n n9e  -- printenv | grep HOSTIP
HOSTIP=172.30.31.46

[root@xcmgt01 n9e-helm]# kubectl exec -it  nightingale-categraf-v6-wkgj9 -n n9e  -- printenv | grep 172.30
HOSTIP=172.30.31.46

sunFlying avatar Dec 18 '24 01:12 sunFlying

感谢反馈,HOSTIP确实就是这么用的 ,有些公有云不允许直接用127访问。 https://github.com/flashcatcloud/categraf/blob/main/k8s/daemonset.yaml#L527

kongfei605 avatar Dec 18 '24 01:12 kongfei605

@sunFlying 遇到同样的问题了,请问url = "https://${HOSTIP}:10250"这个赢加到categraf.internal中什么位置呢

jesse-zhangh avatar Apr 28 '25 02:04 jesse-zhangh

加了没生效,估计没加对,还是报127的IP拒绝连接

categraf:
  type: internal
  internal:
    serviceAccountName: ""
    automountServiceAccountToken: true
    image:
      repository: flashcatcloud/categraf
      tag: latest
    resources:
      requests:
        memory: 64Mi
        cpu: 50m
      limits:
        memory: 64Mi
        cpu: 50m
    extraConfig:
      kubernetes.toml: |
        [inputs.kubernetes]
        interval = "15s"
        kubelet_endpoint = "https://{HOSTIP}:10250"
        bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
        insecure_skip_verify = true
        [instances]
        url = "https://${HOSTIP}:10250/metrics"
        bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
        tls_ca = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
        insecure_skip_verify = true
        interval = "15s"

@sunFlying @kongfei605

jesse-zhangh avatar Apr 28 '25 02:04 jesse-zhangh

加到categraf的env中

kongfei605 avatar May 29 '25 06:05 kongfei605