datadog-agent
datadog-agent copied to clipboard
[ASCII-1023] Render cluster agent using the status component
What does this PR do?
Moves the cluster agent status to the status component. To do that, I had to create several status providers for each cluster agent status section.
I had tested this changes locally following this guide and my local kind cluster
Since I'm migrating one status command at a time, I need to keep the other templates for now. So, the existing templates at pkg/status/render/templates/*
can not be deleted yet, as I want the other status command to work in isolation from the agent status command. I will remove those once I migrate all status commands 🔥
When running the `agent status` inside a agent cluster pod I get this text output:
============================================
Cluster Agent (v7.51.0-rc.1+git.546.5e087ba)
============================================
Status date: 2024-02-14 17:28:37.456 UTC (1707931717456)
Agent start: 2024-02-14 17:28:37.396 UTC (1707931717396)
Pid: 1
Go Version: go1.21.5
Python Version: n/a
Build arch: arm64
Agent flavor: cluster_agent
Log Level: INFO
Paths
=====
Config File: /etc/datadog-agent/datadog-cluster.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
========
Hostname
========
hostname: agent-cluster-control-plane-agent-cluster
socket-fqdn: datadog-agent-cluster-agent-5b55c4cc87-9sk7b
socket-hostname: datadog-agent-cluster-agent-5b55c4cc87-9sk7b
hostname provider: container
unused hostname providers:
'hostname' configuration/environment: hostname is empty
'hostname_file' configuration/environment: 'hostname_file' configuration is not enabled
aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
azure: azure_hostname_style is set to 'os'
fargate: agent is not runnning on Fargate
fqdn: FQDN hostname is not usable
gce: unable to retrieve hostname from GCE: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": dial tcp 169.254.169.254:80: connect: connection refused
os: OS hostname is not usable
=========
Collector
=========
Running Checks
==============
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
Total Runs: 9
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 1, Total: 11
Service Checks: Last Run: 3, Total: 21
Average Execution Time : 1.465s
Last Execution Date : 2024-02-14 17:30:40 UTC (1707931840000)
Last Successful Execution Date : 2024-02-14 17:30:40 UTC (1707931840000)
kubernetes_state_core
---------------------
Instance ID: kubernetes_state_core:f0ece86b2bc4e82e [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_state_core.yaml.default
Total Runs: 9
Metric Samples: Last Run: 389, Total: 2,723
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 3, Total: 21
Average Execution Time : 3ms
Last Execution Date : 2024-02-14 17:30:45 UTC (1707931845000)
Last Successful Execution Date : 2024-02-14 17:30:45 UTC (1707931845000)
orchestrator
------------
Instance ID: orchestrator:c640d4e943da6c1d [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default
Total Runs: 14
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 5ms
Last Execution Date : 2024-02-14 17:30:51 UTC (1707931851000)
Last Successful Execution Date : 2024-02-14 17:30:51 UTC (1707931851000)
Running Checks
==============
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
Total Runs: 9
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 1, Total: 11
Service Checks: Last Run: 3, Total: 21
Average Execution Time : 1.465s
Last Execution Date : 2024-02-14 17:30:40 UTC (1707931840000)
Last Successful Execution Date : 2024-02-14 17:30:40 UTC (1707931840000)
kubernetes_state_core
---------------------
Instance ID: kubernetes_state_core:f0ece86b2bc4e82e [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_state_core.yaml.default
Total Runs: 9
Metric Samples: Last Run: 389, Total: 2,723
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 3, Total: 21
Average Execution Time : 3ms
Last Execution Date : 2024-02-14 17:30:45 UTC (1707931845000)
Last Successful Execution Date : 2024-02-14 17:30:45 UTC (1707931845000)
orchestrator
------------
Instance ID: orchestrator:c640d4e943da6c1d [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default
Total Runs: 14
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 5ms
Last Execution Date : 2024-02-14 17:30:51 UTC (1707931851000)
Last Successful Execution Date : 2024-02-14 17:30:51 UTC (1707931851000)
====================
Admission Controller
====================
Webhooks info
-------------
MutatingWebhookConfigurations name: datadog-webhook
Created at: 2024-02-14 17:09:19 +0000 UTC
---------
Name: datadog.webhook.auto.instrumentation
CA bundle digest: 49e58c003c325ecb
Object selector: &LabelSelector{MatchLabels:map[string]string{admission.datadoghq.com/enabled: true,},MatchExpressions:[]LabelSelectorRequirement{},}
Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
Service: default/datadog-agent-cluster-agent-admission-controller - Port: 443 - Path: /injectlib
---------
Name: datadog.webhook.config
CA bundle digest: 49e58c003c325ecb
Object selector: &LabelSelector{MatchLabels:map[string]string{admission.datadoghq.com/enabled: true,},MatchExpressions:[]LabelSelectorRequirement{},}
Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
Service: default/datadog-agent-cluster-agent-admission-controller - Port: 443 - Path: /injectconfig
---------
Name: datadog.webhook.tags
CA bundle digest: 49e58c003c325ecb
Object selector: &LabelSelector{MatchLabels:map[string]string{admission.datadoghq.com/enabled: true,},MatchExpressions:[]LabelSelectorRequirement{},}
Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
Service: default/datadog-agent-cluster-agent-admission-controller - Port: 443 - Path: /injecttags
Secret info
-----------
Secret name: webhook-certificate
Secret namespace: default
Created at: 2024-02-14 17:09:19 +0000 UTC
CA bundle digest: 49e58c003c325ecb
Duration before certificate expiration: 8759h38m27.516535667s
=============
Autodiscovery
=============
Enabled Features
================
kubernetes
orchestratorexplorer
==========================
Cluster Checks Dispatching
==========================
Status: Leader, serving requests
Active agents: 1
Check Configurations: 0
- Dispatched: 0
- Unassigned: 0
=====================
Custom Metrics Server
=====================
Disabled: The external metrics provider is not enabled on the Cluster Agent
===============
Leader Election
===============
Leader Election Status: Running
Leader Name is: datadog-agent-cluster-agent-5b55c4cc87-9sk7b
Last Acquisition of the lease: Wed, 14 Feb 2024 17:29:08 UTC
Renewed leadership: Wed, 14 Feb 2024 17:30:38 UTC
Number of leader transitions: 3 transitions
=====================
Orchestrator Explorer
=====================
Collection Status: The collection is at least partially running since the cache has been populated.
Cluster Name: agent-cluster
Cluster ID: 177e8363-cd5d-46bc-9190-af292581b872
Container scrubbing: enabled
Manifest collection: enabled
======================
Orchestrator Endpoints
======================
https://orchestrator.datadoghq.com - API Key ending with: 72724
===========
Cache Stats
===========
Elements in the cache: 240
ClusterRoleBinding
Last Run: (Hits: 56 Miss: 0) | Total: (Hits: 560 Miss: 56)
ClusterRole
Last Run: (Hits: 70 Miss: 0) | Total: (Hits: 700 Miss: 70)
Cluster
Last Run: (Hits: 0 Miss: 1) | Total: (Hits: 0 Miss: 11)
CronJob
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
CustomResourceDefinition
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
DaemonSet
Last Run: (Hits: 3 Miss: 0) | Total: (Hits: 30 Miss: 3)
Deployment
Last Run: (Hits: 3 Miss: 0) | Total: (Hits: 30 Miss: 3)
HorizontalPodAutoscaler
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
Ingress
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
Job
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
Namespace
Last Run: (Hits: 4 Miss: 1) | Total: (Hits: 40 Miss: 15)
Node
Last Run: (Hits: 1 Miss: 0) | Total: (Hits: 9 Miss: 2)
PersistentVolumeClaim
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
PersistentVolume
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
Pod
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
ReplicaSet
Last Run: (Hits: 5 Miss: 0) | Total: (Hits: 50 Miss: 5)
RoleBinding
Last Run: (Hits: 13 Miss: 0) | Total: (Hits: 130 Miss: 13)
Role
Last Run: (Hits: 13 Miss: 0) | Total: (Hits: 130 Miss: 13)
ServiceAccount
Last Run: (Hits: 44 Miss: 0) | Total: (Hits: 440 Miss: 44)
Service
Last Run: (Hits: 5 Miss: 0) | Total: (Hits: 50 Miss: 5)
StatefulSet
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)
=====================
Manifest Buffer Stats
=====================
Buffer Flushed : 13 times
Last Time Flushed Manifests : 1
==============================
Manifests Flushed Per Resource
==============================
ClusterRole : 70
ClusterRoleBinding : 56
DaemonSet : 3
Deployment : 3
Namespace : 15
Node : 2
ReplicaSet : 5
Role : 13
RoleBinding : 13
Service : 5
ServiceAccount : 44
==========
Aggregator
==========
Checks Metric Sample: 2,771
Dogstatsd Metric Sample: 1
Event: 12
Events Flushed: 11
Number Of Flushes: 8
Series Flushed: 2,351
Service Check: 42
Service Checks Flushed: 44
=========
Endpoints
=========
https://app.datadoghq.com - API Key ending with:
- 72724
=========
Forwarder
=========
Transactions
============
Cluster: 11
ClusterRole: 1
ClusterRoleBinding: 1
CronJob: 0
CustomResource: 0
CustomResourceDefinition: 0
DaemonSet: 1
Deployment: 1
Dropped: 44
HighPriorityQueueFull: 0
HorizontalPodAutoscaler: 0
Ingress: 0
Job: 0
Namespace: 11
Node: 2
OrchestratorManifest: 11
PersistentVolume: 0
PersistentVolumeClaim: 0
Pod: 0
ReplicaSet: 1
Requeued: 0
Retried: 0
RetryQueueSize: 0
Role: 1
RoleBinding: 1
Service: 1
ServiceAccount: 1
StatefulSet: 0
VerticalPodAutoscaler: 0
Transaction Successes
=====================
Total number: 23
Successes By Endpoint:
check_run_v1: 8
intake: 7
series_v2: 8
HTTP Errors
==================
Total number: 44
HTTP Errors By Code:
403: 44
On-disk storage
===============
On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.
There is one question I would like to get an answer:
~- The current cluster agent displays the logs agent information. https://github.com/DataDog/datadog-~agent/blob/77336caf87eee833a9e872b21ea30040ee0d1cc7/pkg/status/clusteragent/clusteragent.go#L39-L41. The~ ~logs agent component exposes the status provider automatically using FX. The run command for~ ~the cluster agent does not include the comp/logs/agent
dependency. Should we add it to display the logs~ information ~as well? Or should we not add the logs agent to the cluster agent?~
- There is the
cluster-agent-cloudfoundry
command. It does not have any status subcommand. But it requires passing the same components as forcluster-agent
. Does this command actually displays any status output?
Motivation
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Validate that the cluster-agent status
output displays correctly for Text and JSON versions.
There are a few noticable changes in the cluster agent.
- The
Check Runners: 4
information is not displayed. I'm working on a separate PR to add that to the collector section - The Pythin version previous was not displayed now it will show as:
Python Version: n/a
- The Logs Agent section is no longer displayed
- The order of the section has change. Is not order alphabetically.
Reviewer's Checklist
- [ ] If known, an appropriate milestone has been selected; otherwise the
Triage
milestone is set. - [ ] Use the
major_change
label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote. - [ ] A release note has been added or the
changelog/no-changelog
label has been applied. - [ ] Changed code has automated tests for its functionality.
- [ ] Adequate QA/testing plan information is provided. Except if the
qa/skip-qa
label, with required eitherqa/done
orqa/no-code-change
labels, are applied. - [ ] At least one
team/..
label has been applied, indicating the team(s) that should QA this change. - [ ] If applicable, docs team has been notified or an issue has been opened on the documentation repo.
- [ ] If applicable, the
need-change/operator
andneed-change/helm
labels have been applied. - [ ] If applicable, the
k8s/<min-version>
label, indicating the lowest Kubernetes version compatible with this feature. - [ ] If applicable, the config template has been updated.