AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[Feature] kube-dns service not exposing metric port?

Open rgarcia89 opened this issue 2 years ago • 22 comments

Describe scenario I have noticed that on my AKS clusters running on v1.24.9 the kube-dns service in the kube-system namespace is not exposing the coredns pod metrics. Thus the servicemonitor which is deployed by the prometheus-operator chart is not able to collect coredns metrics.

apiVersion: v1
kind: Service
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
  name: kube-dns
  namespace: kube-system
spec:
  clusterIP: 10.0.0.10
  clusterIPs:
  - 10.0.0.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
  selector:
    k8s-app: kube-dns
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
    version: v20
  name: coredns
  namespace: kube-system
spec:
...
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: mcr.microsoft.com/oss/kubernetes/coredns:v1.9.3
        imagePullPolicy: IfNotPresent
        ...
        name: coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
...

From what I can see the official kubernetes manifest includes the metrics exposure on the kube-dns service. https://github.com/kubernetes/kubernetes/blob/v1.24.9/cluster/addons/dns/coredns/coredns.yaml.base

Describe the solution you'd like I would like to see AKS expose the metric directly on the kube-dns service or make it enable via a parameter via the AKS cli. Otherwise I do have to make this change on >10 clusters manually every time I redeploy them.

rgarcia89 avatar Apr 05 '23 13:04 rgarcia89

We are running into the same issue. I'm pretty sure this was exposed in the past (since we had alerts based on the metric).

flo-02-mu avatar May 14 '23 10:05 flo-02-mu

@flo-02-mu in case you are using the kube-prometheus-stack. I have just added a service to the aks platform jsonnet definition that will be created add used to scrape the coredns metrics.

https://github.com/prometheus-operator/kube-prometheus/pull/2107#event-9304184829

rgarcia89 avatar May 23 '23 08:05 rgarcia89

AKS runs multiple coredns pods behind the kube-dns service. If you scrape metrics from the service, the metrics might be inconsistent, because they might be from different pods. You can scrape metrics from pods to get consistent metrics with pod name in the dimension.

robbiezhang avatar Jan 08 '24 23:01 robbiezhang

@robbiezhang that why we are using a headless service 😉

rgarcia89 avatar Jan 09 '24 06:01 rgarcia89

Action required from @aritraghosh, @julia-yin, @AllenWen-at-Azure

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs within 7 days of this comment. @aritraghosh

This issue will now be closed because it hasn't had any activity for 7 days after stale. @rgarcia89 feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.