integrations-core icon indicating copy to clipboard operation
integrations-core copied to clipboard

Support for Ambient Mode metrics for istio integration

Open Yufeireal opened this issue 1 year ago • 11 comments

Hi team,

Istio Ambient Mode has been GA after version 1.24. It looks like the datadog istio integration is explicitly configured for Sidecar mode, is there any workaround or plan to also have some configurations to scrape istio ambient components like ztunnel and istio-cni, thanks!

Yufeireal avatar Nov 30 '24 16:11 Yufeireal

Any update on this question? I am having the same issue. Service monitoring is not working anymore after migrating from Sidecar to Ambient Mesh.

DaanGilissen avatar Dec 17 '24 15:12 DaanGilissen

I'm wondering this as well. I exploring using the DataDog agent to monitor Istio, but since we are targeting ambient mode it seems that this won't work for us. Some indication of when we could expect to see this would be helpful

stephenbuchanan-scout avatar Feb 08 '25 00:02 stephenbuchanan-scout

Any update on this? would love to see Istio ambient mode integrated with datadog!

linsun avatar Mar 13 '25 16:03 linsun

Hi, just checking if there’s any update here, this would be really valuable for a lot of current and new Ambient Mesh/DD users.

asayah avatar May 15 '25 16:05 asayah

Hello there, Any update on this? For users in ambient mode of Istio, not having metrics makes using Datadog less useful.

kirann-hegde avatar May 28 '25 05:05 kirann-hegde

Hi, seems no updates here. We are planning to migrate from istio sidecar mode to istio ambient. But apperently Datadog is not supporting ambient well. Please give us updates.

leelee3264 avatar May 29 '25 07:05 leelee3264

Hello, there is any update on this ? would be great to have metrics for instio in ambient mode in datadog

vinmarco avatar May 30 '25 09:05 vinmarco

After couple of days of experimenting and going through debug logs of datadog agent I was able to figure out a way how to get some extra metrics. I was now looking at ztunnel first. It looks like control plane metrics are kind of getting into Datadog, as well as at least some waypoint metrics. However for ztunnel we got nothing.

I came up with these two working examples however it looks like even when using the second example through Istio integration it is not considered as metric from integration and it might be billed as custom.

If you configure openmetrics with .* for metrics it will complain that the type is not set. So you have to add specify that and it is because the metric in HELP and TYPE is called istio_tcp_sent_bytes but the actual metric exposed is istio_tcp_sent_bytes_total.

The other thing is this warning on openmetrics configuration:

Starting in Datadog Agent v7.32.0, in adherence to the OpenMetrics specification standard, counter names ending in _total must be specified without the _total suffix. For example, to collect promhttp_metric_handler_requests_total, specify the metric name promhttp_metric_handler_requests. This submits to Datadog the metric name appended with .count, promhttp_metric_handler_requests.count.

I believe because of that we have to remove _total and rename it (to whatever you want).

Openmetrics (just one example metric, you can add more)

        ad.datadoghq.com/istio-proxy.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:15020/stats/prometheus",
                  "namespace": "ztunnel",
                  "metrics":[{"istio_tcp_sent_bytes_total":{"name":"istio_tcp_sent_bytes","type":"counter"}}]
                }
              ]
            }
          }  

All 4 TCP ztunnel metrics through istio integration (still probably billed as custom)

        ad.datadoghq.com/istio-proxy.checks: |
          {
            "istio": {
              "instances": [
                {
                  "istio_mesh_endpoint": "http://%%host%%:15020/stats/prometheus",
                  "use_openmetrics": "true",
                  "send_histograms_buckets": "true",
                  "extra_metrics": [
                  {"istio_tcp_sent_bytes_total":{"name":"istio_tcp_sent_bytes","type":"counter"}},
                  {"istio_tcp_received_bytes_total":{"name":"istio_tcp_received_bytes","type":"counter"}},
                  {"istio_tcp_connections_opened_total":{"name":"istio_tcp_connections_opened","type":"counter"}},
                  {"istio_tcp_connections_closed_total":{"name":"istio_tcp_connections_closed","type":"counter"}}
                  ]
                }
              ]
            }
          }   

This shows up under istio.mesh

Image

jan-ludvik avatar Jun 03 '25 12:06 jan-ludvik

It seems that extra_metrics coming through istio integration might not be counting as custom but from an integration. I don't see them in Volume tab in Datadog.

jan-ludvik avatar Jun 04 '25 06:06 jan-ludvik

yesterday I had a call with Datadog support and for ztunnel metrics we added this

podAnnotations:
  ad.datadoghq.com/istio-proxy.checks: |
    {
      "openmetrics": {
        "instances": [
          {
            "namespace": "ztunnel",
            "openmetrics_endpoint": "http://%%host%%:15020/metrics",
            "metrics": [
            {
              "istio_xds_message_total" : {
                "name": "istio_xds_message_total", 
                "type": "gauge"
              },
              "istio_xds_message_bytes_total" : {
                "name": "istio_xds_message_bytes_total", 
                "type": "gauge"
              },
              "istio_xds_connection_terminations_total": {
                "name": "istio_xds_connection_terminations_total", 
                "type": "gauge"
              },
              "istio_tcp_connections_opened_total" : {
                "name": "istio_tcp_connections_opened_total", 
                "type": "gauge"
              },
              "istio_tcp_received_bytes_total" : {
                "name": "istio_tcp_received_bytes_total", 
                "type": "gauge"
              },
              "istio_tcp_sent_bytes_total" : {
                "name": "istio_tcp_sent_bytes_total", 
                "type": "gauge"
              },
              "istio_dns_requests_total" : {
                "name": "istio_dns_requests_total", 
                "type": "gauge"
              },
              "istio_dns_upstream_requests_total" : {
                "name": "istio_dns_upstream_requests_total", 
                "type": "gauge"
              },
              "workload_manager_proxies_started_total" : {
                "name": "workload_manager_proxies_started_total", 
                "type": "gauge"
              },
              "workload_manager_proxies_stopped_total" : {
                "name": "workload_manager_proxies_stopped_total", 
                "type": "gauge"
              },
              "istio_tcp_connections_closed_total" : {
                "name": "istio_tcp_connections_closed_total", 
                "type": "gauge"
              },
              "istio_dns_upstream_failures_total" : {
                "name": "istio_dns_upstream_failures_total", 
                "type": "gauge"
              }  
              },
              ".*"
            ]
          }
        ]
      }
    }

From this snippet there is this ".*" that means, include all the metrics. Even with this special convention some metrics didn't showed up, so I need to add explicitly the ones defined in the list. From dd now I can see those

Image

vinmarco avatar Jun 04 '25 07:06 vinmarco

other update! if you want to have also the metrics for waypoints this is the snippet that I've added in datadog

podAnnotations:
  ad.datadoghq.com/discovery.checks: |
    {
      "openmetrics": {
        "instances": [
          {
            "openmetrics_endpoint"         : "http://%%host%%:15014/metrics",
            "namespace"                            : "waypoint",
            "metrics"                                   : [".*"]
          }
        ]
      }
    }

and on dd you can see the 154 metrics for waypoint

Image

vinmarco avatar Jun 04 '25 08:06 vinmarco

Thanks @jan-ludvik and @vinmarco . Much appreciated.

kirann-hegde avatar Jul 02 '25 05:07 kirann-hegde

@vinmarco did datadog counted it as custom metrics?

zruchi avatar Jul 28 '25 09:07 zruchi

@vinmarco For the waypoint, where did you introduce those annotations on? We are using helm charts for Istio 1.25.1 and i do not see an option to introduce pod annotations in the helm charts. I would appreciate if you could let us know.

kirann-hegde avatar Sep 25 '25 12:09 kirann-hegde

@kirann-hegde Does the following work for you?

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: gw-options
  namespace: httpbin
data:
  deployment: |
    spec:
      template:
        metadata:
          annotations:
            ad.datadoghq.com/istio-proxy.checks: |
              {
                "openmetrics": {
                  "instances": [
                    {
                      "openmetrics_endpoint": "http://%%host%%:15020/stats/prometheus",
                      "namespace": "httpbin",
                      "metrics": [".*"]
                    }
                  ]
                }
              }
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  labels:
    istio.io/waypoint-for: service
  name: waypoint
  namespace: httpbin
spec:
  gatewayClassName: istio-waypoint
  listeners:
  - name: mesh
    port: 15008
    protocol: HBONE
  infrastructure:
    parametersRef:
      group: ""
      kind: ConfigMap
      name: gw-options

Tried this based off https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#automated-deployment and it created the waypoint Pod for me with annotation in the Pod metadata

find-arka avatar Sep 25 '25 22:09 find-arka

@find-arka Thanks. I will try it and let you know.

I'm in the process of setting up monitoring and alerting for Istio in ambient mode, and I want to make sure it's as effective as possible. With so many metrics available, I'm reaching out for your valuable insights!

Here are a few key questions I have:

  •  From the End User & Engineer Perspective: What metrics and alerts do you find most relevant for both end users and engineers managing Istio?
    
  •  Performance Metrics: What specific performance metrics are you currently monitoring?
    
  •  Common Alerts: What alerts do you typically configure?
    

I have some initial ideas, but your feedback would be invaluable in helping me create impactful dashboards and robust monitoring solutions.

Thanks in advance for your help!

kirann-hegde avatar Sep 30 '25 16:09 kirann-hegde