Missing source, destination context with hubble_dns* metrics
Describe the bug Both metrics hubble_dns_queries_total and hubble_dns_responses_total have empty destination and source labels.
To Reproduce
Install retina-hubble with:
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina-hubble \
--version $VERSION \
--namespace kube-system \
--set cluster.name="aksclustername" \
--set os.windows=false \
--set operator.enabled=true \
--set operator.enableRetinaEndpoint=true \
--set operator.repository=ghcr.io/microsoft/retina/retina-operator \
--set operator.tag=$VERSION \
--set agent.enabled=true \
--set agent.repository=ghcr.io/microsoft/retina/retina-agent \
--set agent.tag=$VERSION \
--set agent.init.enabled=true \
--set agent.init.repository=ghcr.io/microsoft/retina/retina-init \
--set agent.init.tag=$VERSION \
--set logLevel=info \
--set hubble.relay.tls.server.enabled=false \
--set hubble.relay.prometheus.enabled=true \
--set hubble.relay.prometheus.serviceMonitor.enabled=true \
--set hubble.tls.enabled=false \
--set hubble.tls.auto.enabled=false \
--set hubble.tls.auto.method=cronJob \
--set hubble.tls.auto.certValidityDuration=1 \
--set hubble.tls.auto.schedule="*/10 * * * *" \
--set hubble.metrics.serviceMonitor.enabled=true \
--set prometheus.serviceMonitor.namespace=kube-system \
--set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\,packetparser\]" \
--set enablePodLevel=true \
--set remoteContext=true \
--set enableAnnotations=true
To test, run curl http://retina_pod_ip:9965/metrics. Check both DNS metrics labels.
from retina-config:
hubble-metrics: flow:sourceEgressContext=pod;destinationIngressContext=pod tcp:sourceEgressContext=pod;destinationIngressContext=pod
dns:query;sourceEgressContext=pod;destinationIngressContext=pod drop:sourceEgressContext=pod;destinationIngressContext=pod
With above configuration destination or source labels are visible in hubble_tcp_flags_total or hubble_drop_total metrics. In AKS cluster with Azure CNI powered by Cilium both DNS metric labels are filled with pod names.
Or maybe it is not a bug, but I'm missing something?
Expected behavior hubble_dns_responses_total should have destination label filled with namespace/pod-name. hubble_dns_queries_total should have source label filled with namespace/pod-name.
Platform (please complete the following information):
- OS: Ubuntu 22.04.5 LTS
- Kubernetes Version: v1.31.6, v1.32.6
- Host: AKS, Azure CNI with Calico polices (NOT Cilium), direct routing (not overlay)
- Retina Version: v0.0.36
Same in Azure CNI Overlay Powered by Cilium (with ACNS enabled) https://learn.microsoft.com/en-us/azure/aks/container-network-observability-metrics?tabs=Cilium#pod-level-metrics-hubble-metrics
| Metric | Label | Exists |
|---|---|---|
| hubble_dns_queries_total | cluster | ✅ |
| hubble_dns_queries_total | instance | ✅ |
| hubble_dns_queries_total | ips_returned | ✅ |
| hubble_dns_queries_total | job | ✅ |
| hubble_dns_queries_total | microsoft.resourceid | ✅ |
| hubble_dns_queries_total | qtypes | ✅ |
| hubble_dns_queries_total | source | ✅ |
| hubble_dns_queries_total | destination | ❌ Missing, should contain CoreDNS <namespace>/<pod> |
| hubble_dns_responses_total | cluster | ✅ |
| hubble_dns_responses_total | instance | ✅ |
| hubble_dns_responses_total | ips_returned | ✅ |
| hubble_dns_responses_total | job | ✅ |
| hubble_dns_responses_total | microsoft.resourceid | ✅ |
| hubble_dns_responses_total | qtypes | ✅ |
| hubble_dns_responses_total | source | ✅ |
| hubble_dns_responses_total | destination | ❌ Missing, should contain the client <namespace>/<pod> |