dd-agent
dd-agent copied to clipboard
[ceph] add metric for active monitors #2719
What does this PR do?
This PR adds code to create the ceph.num_mons.active metric. This metric will return the number of active mon processes as seen in the number of quorum members. This differs from the existing ceph.num_mons metric, which returns the number of configured mon processes.
Motivation
In Ceph, a quorum of mon processes in the cluster are required. Datadog monitors cannot alert to dangers unless the number of active mons (compared to the number configured) is known.
Testing Guidelines
Changes to tests add in the ceph.num_mons.active metric to check for in addition to ceph.num_mons. No changes to test json input expected.
Additional Notes
It's pretty simple, working fine in a test environment where we altered the base Ceph dashboard to return the number of down mon processes (ceph.num_mons - ceph.num_mons.active) and change from green to yellow if > 0, red if > 1, where the quorum of that cluster would be in danger.
Thank you very much for your contribution, @ccocsas! We'll review it soon and either merge it or get back to you with some feedback!
Hey @ccocsas, which version of Ceph are you using?
Hammer .94.9 (which is probably .94.8, but that's what the dpkg got tagged as in the repositories).
Planning a migration to Jewel.