dd-agent icon indicating copy to clipboard operation
dd-agent copied to clipboard

[agent_metrics] add metrics for num_metrics and num_events

Open cberry777 opened this issue 9 years ago • 5 comments

What does this PR do?

Adds additional metrics for num_metrics and num_events to agent_metrics

Motivation

It is very important to monitor the number of metrics and events emitted from each agent. It allows us to 1) keep track of the total number of metrics sent to Datadog (to monitor billing), and 2) locate rogue agents emitting above some threshold.

Testing Guidelines

A test is provided: /tests/checks/mock/test_agent_metrics.py (# test_num_metrics)

Additional Notes

An optional switch is provided (in the init_config) that allows one to log the number of metrics and events for each collection run.

cberry777 avatar Oct 08 '16 17:10 cberry777

Thanks @cberry777 FYI, i don't think it will be really useful billing wise as:

  • This just counts metrics coming from checks.d and not old style checks (most of system metrics)
  • It doesn't count dogstatsd metrics
  • It doesn't differentiate between integration metrics and custom metrics

However, it might be useful to track that so we'll get it merged for our 5.11 release.

Can you have a look at the failing tests please ?

remh avatar Oct 27 '16 12:10 remh

Again. Tests are failing that have nothing to do with this code. I think that the test suite is unstable??

FAIL: Support SNMP scalar objects
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/DataDog/dd-agent/tests/checks/integration/test_snmp.py", line 267, in test_scalar
    self.assertMetric(metric_name, tags=self.CHECK_TAGS, count=1)
  File "/home/travis/build/DataDog/dd-agent/tests/checks/common.py", line 350, in assertMetric
    self._candidates_size_assert(candidates, count=count, at_least=at_least)
  File "/home/travis/build/DataDog/dd-agent/tests/checks/common.py", line 320, in _candidates_size_assert
    "Needed exactly %d candidates, got %d" % (count, len(candidates))

All tests pass when I run "rake"


Ran 176 tests in 20.422s

OK (SKIP=1) Cleaning up

cberry777 avatar Dec 02 '16 21:12 cberry777

Is there a way to "re-fire" the test suite?? (without forcing a bogus commit)

cberry777 avatar Dec 02 '16 21:12 cberry777

It was a flaky test, all green now.

masci avatar Dec 03 '16 16:12 masci

Hey @cberry777! Thanks a lot for your contribution.

I think I missed this one when we went through our SDK move. This should be moved to our Integrations Core repo and closed here. I looked it over and don't see anything standout that needs to be changed. If you move it I see no reason it couldn't be merged easily!

gmmeyer avatar Jul 06 '17 16:07 gmmeyer