datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

[CONTINT-1148] ignore ad tags adv2 and pod annotations

Open AliDatadog opened this issue 1 year ago • 1 comments

What does this PR do?

This PR fixes the parsing of ignore_autodiscovery_tags with pod annotations and/or adv2.

Motivation

This feature is supported when adding a field in the conf.d of a check. It is also supported with adv1 for service and endpoints check. It should be supported with adv2 and pod annotations. Public doc.

Additional Notes

We should deprecate hybrid setups using v2 for check specification and v1 for ignore_autodiscovery_tags as follows:

    ad.datadoghq.com/service.checks: |
        {
          "http_check": {
            "init_config": {},
            "instances": [
              {
                "url": "http://%%host%%:80",
                "name": "my-semi-adv1-nginx-service",
                "timeout": 1
              }
            ]
          }
        }
    ad.datadoghq.com/service.ignore_autodiscovery_tags: "true"

Since some customers might be using it, we added it back in the last commit with warning logs and behind a the config flag cluster_checks.support_hybrid_ignore_ad_tags (that is disabled by default)

Possible Drawbacks / Trade-offs

N/A

Describe how to test/QA your changes

Deploy the agent on Kubernetes. We need to check several setups:

  • Service checks
  • Endpoints check
  • Check scheduled with pod annotations For all of them, we need to test setting ignore_autodiscovery_tags with adv1 and adv2. Then, we can check from the UI that the metrics are not tagged with autodiscovery tags (typically kube_namespace). We also need to check that the tags aren't attached to the check with agent configcheck.

ADv2

  • Service:
        ad.datadoghq.com/service.checks: |
        {
          "http_check": {
            "init_config": {},
            "ignore_autodiscovery_tags": true,
            "instances": [
              {
                "url": "http://%%host%%:80",
                "name": "my-adv2-nginx-service",
                "timeout": 1
              }
            ]
          }
        }

Without ignore_autodiscovery_tags, we can find a list of tags:

❯ k -n datadog-agent-helm exec datadog-agent-linux-cluster-agent-6c5689879b-kwmwb -- agent configcheck | grep -i my-adv1-nginx -C 15
...
=== http_check cluster check ===
Configuration provider: kubernetes-services
Configuration source: kube_services:kube_service://workload-nginx/nginx
Config for instance ID: http_check:my-adv2-nginx-service:5ea2dccbba608a6e
name: my-adv2-nginx-service
tags:
- kube_namespace:workload-nginx
- kube_service:nginx
timeout: 1
url: http://10.96.83.141:80
~
Init Config:
{}
Auto-discovery IDs:
* kube_service://workload-nginx/nginx
===

With ignore_autodiscovery_tags, tags aren't there anymore:

❯ k -n datadog-agent-helm exec datadog-agent-linux-cluster-agent-6c5689879b-kwmwb -- agent configcheck | grep -i my-adv1-nginx -C 15
...
=== http_check cluster check ===
Configuration provider: kubernetes-services
Configuration source: kube_services:kube_service://workload-nginx/nginx
Config for instance ID: http_check:my-adv2-nginx-service:d2cf521cdc222208
name: my-adv2-nginx-service
timeout: 1
url: http://10.96.83.141:80
~
Init Config:
{}
Auto-discovery IDs:
* kube_service://workload-nginx/nginx
===
  • Endpoints:
ad.datadoghq.com/endpoints.checks: |
        {
          "http_check": {
            "init_config": {},
            "ignore_autodiscovery_tags": true,
            "instances": [
              {
                "url": "http://%%host%%:80",
                "name": "my-adv2-nginx-service",
                "timeout": 1
              }
            ]
          }
        }

Same as before. Without ignore_autodiscovery_tags, we can find tags:

❯ k -n datadog-agent-helm exec datadog-agent-linux-cluster-agent-6c5689879b-kwmwb -- agent configcheck | grep -i my-adv2-nginx -C 15
...
=== http_check cluster check ===
Configuration provider: kubernetes-endpoints
Configuration source: kube_endpoints:kube_endpoint_uid://workload-nginx/nginx/
Config for instance ID: http_check:my-adv2-nginx-service:93cf1829256ea3b1
name: my-adv2-nginx-service
tags:
- kube_endpoint_ip:10.244.1.17
- kube_namespace:workload-nginx
- kube_service:nginx
timeout: 1
url: http://10.244.1.17:80
~
Init Config:
{}
Auto-discovery IDs:
* kube_endpoint_uid://workload-nginx/nginx/10.244.1.17
* kubernetes_pod://fc0c3e2b-6485-4ff8-89d3-d48e908756e5
State: dispatched to ali-cluster-worker
===

Withignore_autodiscovery_tags:

❯ k -n datadog-agent-helm exec datadog-agent-linux-cluster-agent-6c5689879b-kwmwb -- agent configcheck | grep -i my-adv2-nginx -C 15
...
=== http_check cluster check ===
Configuration provider: kubernetes-endpoints
Configuration source: kube_endpoints:kube_endpoint_uid://workload-nginx/nginx/
Config for instance ID: http_check:my-adv2-nginx-service:da201cfe91da182c
name: my-adv2-nginx-service
timeout: 1
url: http://10.244.1.17:80
~
Init Config:
{}
Auto-discovery IDs:
* kube_endpoint_uid://workload-nginx/nginx/10.244.1.17
* kubernetes_pod://fc0c3e2b-6485-4ff8-89d3-d48e908756e5
State: dispatched to ali-cluster-worker
===
  • Pod annotations:
        ad.datadoghq.com/redis.checks: |
          {
            "redisdb": {
              "ignore_autodiscovery_tags": true,
              "instances": [
                {
                  "host": "%%host%%",
                  "port": "6379"
                }
              ]
            }
          }

Without:

❯ k -n datadog-agent-helm exec datadog-agent-linux-tcmk5 -- agent configcheck | grep -i redisdb -C 10
...
=== redisdb check ===
Configuration provider: kubernetes-container-allinone
Configuration source: container:containerd://54f7743eb31f6e253a6abf1bf9ea9394e862ba74f03143c4d193c3e16ba63de0
Config for instance ID: redisdb:b3aff7f0711ca64e
host: 10.244.1.116
port: "6379"
tags:
- container_id:54f7743eb31f6e253a6abf1bf9ea9394e862ba74f03143c4d193c3e16ba63de0
- container_name:redis
- display_container_name:redis_redis-6d75f65b9c-6vdrs
- image_id:********@sha256:11c3e418c29672341be9a8e3015d96f05b88e5ad58829885d36f8342b4da13c2
- image_name:redis
- image_tag:latest
- kube_container_name:redis

With:

{}
❯ k -n datadog-agent-helm exec datadog-agent-linux-tcmk5 -- agent configcheck | grep -i redisdb -C 10
...
=== redisdb check ===
Configuration provider: kubernetes-container-allinone
Configuration source: container:containerd://de90e592d2512f2a21d7d18c3a693414bb9992af8350e84f0110eb9d436f47e3
Config for instance ID: redisdb:b93c61e0dc8a649c
host: 10.244.1.9
port: "6379"
~
Init Config:
{}
Auto-discovery IDs:
* containerd://de90e592d2512f2a21d7d18c3a693414bb9992af8350e84f0110eb9d436f47e3
===

Adv1

  • Service:
    ad.datadoghq.com/service.check_names: '["http_check"]'
    ad.datadoghq.com/service.init_configs: '[{}]'
    ad.datadoghq.com/service.instances: '[{"name": "my-adv1-nginx-service", "url": "http://%%host%%/nginx_status", "timeout": 1}]'
    ad.datadoghq.com/service.ignore_autodiscovery_tags: "true" # or false

Without:

=== http_check cluster check ===
Configuration provider: kubernetes-services
Configuration source: kube_services:kube_service://workload-nginx/nginx
Config for instance ID: http_check:my-adv1-nginx-service:5d219b67ebcf7da0
name: my-adv1-nginx-service
tags:
- kube_namespace:workload-nginx
- kube_service:nginx
timeout: 1
url: http://10.96.83.141/nginx_status
~
Init Config:
{}
Auto-discovery IDs:
* kube_service://workload-nginx/nginx
===

With:

=== http_check cluster check ===
Configuration provider: kubernetes-services
Configuration source: kube_services:kube_service://workload-nginx/nginx
Config for instance ID: http_check:my-adv1-nginx-service:9fec5771230650c6
name: my-adv1-nginx-service
timeout: 1
url: http://10.96.83.141/nginx_status
~
Init Config:
{}
Auto-discovery IDs:
* kube_service://workload-nginx/nginx
===
  • Endpoints:
    ad.datadoghq.com/endpoints.check_names: '["http_check"]'
    ad.datadoghq.com/endpoints.init_configs: '[{}]'
    ad.datadoghq.com/endpoints.instances: '[{"name": "my-adv1-nginx-service", "url": "http://%%host%%/nginx_status", "timeout": 1}]'
    ad.datadoghq.com/endpoints.ignore_autodiscovery_tags: "false" # or true

Without

=== http_check cluster check ===
Configuration provider: kubernetes-endpoints
Configuration source: kube_endpoints:kube_endpoint_uid://workload-nginx/nginx/
Config for instance ID: http_check:my-adv1-nginx-service:fb4d7f9d082f986f
name: my-adv1-nginx-service
tags:
- kube_endpoint_ip:10.244.1.17
- kube_namespace:workload-nginx
- kube_service:nginx
timeout: 1
url: http://10.244.1.17/nginx_status
~
Init Config:
{}
Auto-discovery IDs:
* kube_endpoint_uid://workload-nginx/nginx/10.244.1.17
* kubernetes_pod://fc0c3e2b-6485-4ff8-89d3-d48e908756e5
State: dispatched to ali-cluster-worker
===

With

=== http_check cluster check ===
Configuration provider: kubernetes-endpoints
Configuration source: kube_endpoints:kube_endpoint_uid://workload-nginx/nginx/
Config for instance ID: http_check:my-adv1-nginx-service:2f5beff94d73ef18
name: my-adv1-nginx-service
timeout: 1
url: http://10.244.1.17/nginx_status
~
Init Config:
{}
Auto-discovery IDs:
* kube_endpoint_uid://workload-nginx/nginx/10.244.1.17
* kubernetes_pod://fc0c3e2b-6485-4ff8-89d3-d48e908756e5
State: dispatched to ali-cluster-worker
===
  • Pod annotations:
        ad.datadoghq.com/nginx.check_names: '["nginx"]'
        ad.datadoghq.com/nginx.init_configs: '[{}]'
        ad.datadoghq.com/nginx.instances: '[{"name": "my-nginx-check", "nginx_status_url": "http://%%host%%/nginx_status"}]'
        ad.datadoghq.com/nginx.logs: '[{"type": "docker","image": "nginx","service": "nginx","source": "nginx"}]'
        ad.datadoghq.com/nginx.ignore_autodiscovery_tags: "true" # or false

Without

=== nginx check ===
Configuration provider: kubernetes-container-allinone
Configuration source: container:containerd://c879011752d0476c355b413500a52d89d88626f73de4f2c24afd1b9feffb2e0e
Config for instance ID: nginx:my-nginx-check:807b93caedb0fcca
name: my-nginx-check
nginx_status_url: http://10.244.1.163/nginx_status
tags:
- container_id:c879011752d0476c355b413500a52d89d88626f73de4f2c24afd1b9feffb2e0e
- container_name:nginx
- display_container_name:nginx_nginx-78fc4b9cf8-qqh8r
- image_id:********@sha256:0a86f70acfc0f140499babf9efc68a8ba0e5f4c3c9cdaaec7269b2f787639d0c
- image_name:docker.io/alidatadog/nginx-custom
- image_tag:1.12.0
- kube_container_name:nginx
- kube_deployment:nginx
- kube_namespace:workload-nginx
- kube_ownerref_kind:replicaset
- kube_ownerref_name:nginx-78fc4b9cf8
- kube_qos:BestEffort
- kube_replica_set:nginx-78fc4b9cf8
- pod_name:nginx-78fc4b9cf8-qqh8r
- pod_phase:running
- service:nginx
- short_image:nginx-custom
~
Init Config:
{}
Auto-discovery IDs:
* containerd://c879011752d0476c355b413500a52d89d88626f73de4f2c24afd1b9feffb2e0e
===

With

=== nginx check ===
Configuration provider: kubernetes-container-allinone
Configuration source: container:containerd://5b8f5e366b3efa50e5b2a71e9a9a8ac062c9d83a11062cc195942f9fcac04752
Config for instance ID: nginx:my-nginx-check:92c50f7217ef66f0
name: my-nginx-check
nginx_status_url: http://10.244.1.154/nginx_status
~
Init Config:
{}
Auto-discovery IDs:
* containerd://5b8f5e366b3efa50e5b2a71e9a9a8ac062c9d83a11062cc195942f9fcac04752
===

Hybrid

Set cluster_checks.support_hybrid_ignore_ad_tags to true and test only with service/endpoints check:

    ad.datadoghq.com/service.checks: | # or endpoints
        {
          "http_check": {
            "init_config": {},
            "instances": [
              {
                "url": "http://%%host%%:80",
                "name": "my-hybrid-nginx-service",
                "timeout": 1
              }
            ]
          }
        }
    ad.datadoghq.com/endpoints.ignore_autodiscovery_tags: "true" # or endpoints

Reviewer's Checklist

  • [ ] If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • [ ] Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • [ ] A release note has been added or the changelog/no-changelog label has been applied.
  • [ ] Changed code has automated tests for its functionality.
  • [ ] Adequate QA/testing plan information is provided. Except if the qa/skip-qa label, with required either qa/done or qa/no-code-change labels, are applied.
  • [ ] At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • [ ] If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • [ ] If applicable, the need-change/operator and need-change/helm labels have been applied.
  • [ ] If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • [ ] If applicable, the config template has been updated.

AliDatadog avatar Feb 07 '24 20:02 AliDatadog

Bloop Bleep... Dogbot Here

Regression Detector Results

Run ID: 06181b29-05cb-4e70-b450-e8330186f6f0 Baseline: 15cf63aa1e85d62bb6cb5b18824b98da77facb25 Comparison: 746d7ec1853798a76768acce19b11a9ed3226b88 Total CPUs: 7

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI
file_to_blackhole % cpu utilization +0.75 [-5.87, +7.36]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
file_to_blackhole % cpu utilization +0.75 [-5.87, +7.36]
idle memory utilization +0.20 [+0.18, +0.23]
file_tree memory utilization +0.14 [+0.06, +0.23]
trace_agent_json ingress throughput +0.02 [-0.01, +0.05]
tcp_dd_logs_filter_exclude ingress throughput +0.00 [-0.00, +0.00]
trace_agent_msgpack ingress throughput +0.00 [-0.00, +0.00]
uds_dogstatsd_to_api ingress throughput +0.00 [-0.00, +0.00]
process_agent_real_time_mode memory utilization -0.14 [-0.16, -0.11]
otel_to_otel_logs ingress throughput -0.18 [-0.77, +0.42]
tcp_syslog_to_blackhole ingress throughput -0.42 [-0.49, -0.36]
process_agent_standard_check_with_stats memory utilization -0.75 [-0.79, -0.72]
process_agent_standard_check memory utilization -0.90 [-0.95, -0.84]
uds_dogstatsd_to_api_cpu % cpu utilization -1.19 [-2.60, +0.23]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

pr-commenter[bot] avatar Feb 15 '24 15:02 pr-commenter[bot]

/merge

clamoriniere avatar Feb 16 '24 17:02 clamoriniere

:steam_locomotive: MergeQueue

Pull request added to the queue.

There are 5 builds ahead! (estimated merge in less than 1h)

Use /merge -c to cancel this operation!

dd-devflow[bot] avatar Feb 16 '24 17:02 dd-devflow[bot]