AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[BUG] Exposition of port 9965 on cilium pods and service label selectors missing

Open lukibahr opened this issue 11 months ago • 6 comments

Describe the bug

After enabling ACNS according the docs https://learn.microsoft.com/en-us/azure/aks/advanced-network-observability-cli?tabs=cilium#visualization-using-byo-grafana, the goal is to visualize hubble metrics in Grafana. Enabling ACNS successfully installs cilium and its pods - you can fetch metrics from the pod by executing in the pod using kubectl exec -it <cilium-pod> -- /bin/bash and (after installing curl or wget in the container) run curl -X GET localhost:9965/metrics.

However, the hubble metrics server port 9965 is not exposed by the cilium pod. The only port which is exposed by the pod is 9962, which references on Cilium metrics only:

    ports:
    - containerPort: 9962
      hostPort: 9962
      name: prometheus
      protocol: TCP

Additionally, the service in kube-system namespace network-observability does not select pods due to missing endpoints because the do not have label selectors which makes it unable to build a servicemonitor for adding scrape config to the prometheus (like it's describe in the docs above). The service network-observability should have a label selector on k8s-app: cilium - see the following yaml snippet:

# this is a customer generated service that selects the pods by the selector. field

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: aks-managed-kappie
  labels:
    k8s-app: hubble-workaround
  name: network-observability-workaround
  namespace: kube-system
spec:
  ports:
  - name: hubble
    port: 9965
    protocol: TCP
    targetPort: 9965
  - name: cilium
    port: 9962
    protocol: TCP
    targetPort: 9962
  type: ClusterIP
  selector: # missing selector
    k8s-app: cilium

To Reproduce

For steps to reproduce the behavior, see above.

❯ kubectl port-forward -n kube-system svc/network-observability 9965:9965
                                                                                                                                                                      
error: cannot attach to *v1.Service: invalid service 'network-observability': Service is defined without a selector.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes version: Client Version: v1.31.2, Kustomize Version: v5.4.2, Server Version: v1.31.2

lukibahr avatar Dec 13 '24 13:12 lukibahr

Action required from @aritraghosh, @julia-yin, @AllenWen-at-Azure

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Running into the same issue.

Trying to workaround... I got cilium metrics scraped with a PodMonitor instead (prometheus named port) but because 9965 is not exposed, those metrics are not getting scraped :(

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: cilium-network-observability
  namespace: monitoring
  labels:
    app.kubernetes.io/part-of: cilium
spec:
  podMetricsEndpoints:
    - port: prometheus
      path: /metrics
      interval: 30s
    - targetPort: 9965
      path: /metrics
      interval: 30s
  selector:
    matchLabels:
      k8s-app: cilium
  namespaceSelector:
    matchNames:
      - kube-system

niekcandaele avatar May 07 '25 16:05 niekcandaele

@chasewilson, @paulgmiller, @wedaly, @quantumn-a5, @tamilmani1989 would you be able to assist?

Hi @lukibahr , thanks for raising the issue.

port 9965

Even though 9965 is not specified as a containerPort in the cilium daemonset, Cilium still exposes metrics on this port.

$ kns kube-system
$ kubectl get cm cilium-config -oyaml | grep 9965
hubble-metrics-server: :9965
$ kubectl port-forward cilium-ngx7p 9965:9965 &
$ curl -s localhost:9965/metrics | grep hubble
Handling connection for 9965
# HELP hubble_drop_total Number of drops
# TYPE hubble_drop_total counter
hubble_drop_total{destination="",protocol="ICMPv6",reason="UNSUPPORTED_L3_PROTOCOL",source=""} 578
...

network-observability service

This service is created by AKS for managed Prometheus offering and is not recommended for querying metrics from agents. I like the PodMonitor way suggested by @niekcandaele . @niekcandaele I am curious why the PodMonitor spec uses both port and targetPort. Won't the following spec work?

podMetricsEndpoints:
  - port: 9965
    path: /metrics
    interval: 30s

anubhabMajumdar avatar May 23 '25 00:05 anubhabMajumdar

This issue will now be closed because it hasn't had any activity for 7 days after stale. @lukibahr feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.