Consul NodeName (NodeId) in Service Checks
Note: If you have a feature request, you should contact support so the request can be properly tracked.
Output of the info page
Getting the status from the agent.
===============
Agent (v7.37.1)
===============
Status date: 2022-07-27 15:16:34.997 UTC (1658934994997)
Agent start: 2022-07-22 18:34:00.519 UTC (1658514840519)
Pid: 18465
Go Version: go1.17.11
Python Version: 3.8.11
Build arch: amd64
Agent flavor: agent
Check Runners: 4
Log Level: info
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: 385µs
System time: 2022-07-27 15:16:34.997 UTC (1658934994997)
Host Info
=========
bootTime: 2021-09-23 20:11:50 UTC (1632427910000)
hostId: <redacted>
kernelArch: x86_64
kernelVersion: 4.9.0-16-amd64
os: linux
platform: debian
platformFamily: debian
platformVersion: 9.13
procs: 121
uptime: 7246h22m12s
Hostnames
=========
<redacted>
Metadata
========
agent_version: 7.37.1
cloud_provider: AWS
config_apm_dd_url:
config_dd_url: https://app.datadoghq.com
config_logs_dd_url:
config_logs_socks5_proxy_address:
config_no_proxy: []
config_process_dd_url:
config_proxy_http:
config_proxy_https:
config_site:
feature_apm_enabled: true
feature_cspm_enabled: false
feature_cws_enabled: false
feature_logs_enabled: false
feature_networks_enabled: false
feature_networks_http_enabled: false
feature_networks_https_enabled: false
feature_otlp_enabled: false
feature_process_enabled: false
feature_processes_container_enabled: true
flavor: agent
hostname_source: os
install_method_installer_version: deb_package
install_method_tool: dpkg
install_method_tool_version: dpkg-1.18.26
=========
Collector
=========
Running Checks
==============
consul (2.1.0)
--------------
Instance ID: consul:default:b77b05cc5a5351d9 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/consul.d/conf.yaml
Total Runs: 28,011
Metric Samples: Last Run: 1, Total: 28,011
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 2, Total: 57,359
Average Execution Time : 4ms
Last Execution Date : 2022-07-27 15:16:32 UTC (1658934992000)
Last Successful Execution Date : 2022-07-27 15:16:32 UTC (1658934992000)
metadata:
version.major: 1
version.minor: 8
version.patch: 4
version.raw: 1.8.4
version.scheme: semver
<redacted>
=========
Forwarder
=========
Transactions
============
Cluster: 0
ClusterRole: 0
ClusterRoleBinding: 0
CronJob: 0
DaemonSet: 0
Deployment: 0
Dropped: 0
HighPriorityQueueFull: 0
Ingress: 0
Job: 0
Node: 0
PersistentVolume: 0
PersistentVolumeClaim: 0
Pod: 0
ReplicaSet: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Role: 0
RoleBinding: 0
Service: 0
ServiceAccount: 0
StatefulSet: 0
Transaction Successes
=====================
Total number: 59055
Successes By Endpoint:
check_run_v1: 28,010
intake: 2,335
metadata_v1: 700
series_v1: 28,010
On-disk storage
===============
On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.
API Keys status
===============
API key ending with 91d2c: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- 91d2c
==========
Logs Agent
==========
Logs Agent is not running
=============
Process Agent
=============
Version: 7.37.1
Status date: 2022-07-27 15:16:44.135 UTC (1658935004135)
Process Agent Start: 2022-07-22 18:34:00.573 UTC (1658514840573)
Pid: 18466
Go Version: go1.17.11
Build arch: amd64
Log Level: info
Enabled Checks: [process_discovery]
Allocated Memory: 13,024,816 bytes
Hostname: <redacted> # consul-server-host (leader)
=================
Process Endpoints
=================
https://process.datadoghq.com - API Key ending with:
- 91d2c
=========
Collector
=========
Last collection time: 2022-07-27 14:34:01
Docker socket:
Number of processes: 0
Number of containers: 0
Process Queue length: 0
RTProcess Queue length: 0
Pod Queue length: 0
Process Bytes enqueued: 0
RTProcess Bytes enqueued: 0
Pod Bytes enqueued: 0
Drop Check Payloads: []
=========
APM Agent
=========
<redacted>
=========
Aggregator
=========
Checks Metric Sample: 6,678,903
Dogstatsd Metric Sample: 4,565,629
Event: 1
Events Flushed: 1
Number Of Flushes: 28,010
Series Flushed: 8,834,639
Service Check: 310,384
Service Checks Flushed: 338,390
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 4,565,628
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 406,735,605
Udp Packet Reading Errors: 0
Udp Packets: 2,650,131
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
Unterminated Metric Errors: 0
====
OTLP
====
Status: Not enabled
Collector status: Not running
Additional environment details (Operating System, Cloud provider, etc):
Steps to reproduce the issue:
- Install the
consul.dcheck/integration - ????
- Non-profit
Describe the results you received:
The output of the Consul Service Checks for Consul Service Healthchecks does not include a node, node_name, nor node_id tag or information on the Datadog Service Checks.
Describe the results you expected:
A tag or information should exist for node, node_name, or node_id on the Datadog Service Check (since the information is available and retrieved from the Consul API).
Additional information you deem important (e.g. issue happens only occasionally):
The problem is as follows: Consul Service Checks have information such as Ok, Warning, Critical for the Service, Check (id), and Node (which host the check is failing for). However, the Datadog Consul integration does not seem to gather that Node Name/Id bit of information. So, when a Datadog Consul Service Check is in the Critical state (like consul.check) the information provided only gives details about the Consul Service and Check Name/Id... which is not particularly useful because what happens when you have a Consul Service with 50 Nodes? Which Node has the check failing?
The tag should be added here: https://github.com/DataDog/integrations-core/blob/f8c50c779dc836e9419326a5d2d64524f3216821/consul/datadog_checks/consul/consul.py#L367-L375
Specifically on/after line 373:
if check["Node"]:
tags.append("consul_node:{}".format(check["Node"]))
sc[sc_id] = {'status': status, 'tags': tags}
The data is available and returned in the Consul API endpoint /v1/health/state/any on line 356: https://github.com/DataDog/integrations-core/blob/f8c50c779dc836e9419326a5d2d64524f3216821/consul/datadog_checks/consul/consul.py#L356
See: https://www.consul.io/api-docs/health#sample-response-3
Example Response:
[
{
"Node": "foobar",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "",
"ServiceName": "",
"ServiceTags": [],
"Namespace": "default"
},
[...]
]
Note: I was unable to open a Datadog Support ticket for this issue/feature request because the provided link and support center did not have an option available for such tickets/questions/requests.
If you can point me in the right direction, I'll be happy to open a ticket.
Alternatively, if this feature seems low-hanging enough, I am also happy to submit a PR to add this tag information (even behind a flag/option if desired).
Additionally, this Node Name information is available in the Telegraf consul plugin, but this plugin is not ideal because the metrics collected from Consul in this manner and submitted to Datadog API are considered custom metrics (and are thus billed differently).
See:
- Metrics Collected: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/consul/README.md#metric_version--1
- Implementation: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/consul/consul.go#L111
Hi @hjkatz, thanks for opening this issue and the great description!
I created a card in our backlog to work on this. However, we would be also happy to review your PR if you want to take care of this.
@FlorentClarret Thanks for responding, here's the PR: https://github.com/DataDog/integrations-core/pull/12675