dd-agent icon indicating copy to clipboard operation
dd-agent copied to clipboard

[ceph] add metric for active monitors #2719

Open ccocsas opened this issue 9 years ago • 3 comments

What does this PR do?

This PR adds code to create the ceph.num_mons.active metric. This metric will return the number of active mon processes as seen in the number of quorum members. This differs from the existing ceph.num_mons metric, which returns the number of configured mon processes.

Motivation

In Ceph, a quorum of mon processes in the cluster are required. Datadog monitors cannot alert to dangers unless the number of active mons (compared to the number configured) is known.

Testing Guidelines

Changes to tests add in the ceph.num_mons.active metric to check for in addition to ceph.num_mons. No changes to test json input expected.

Additional Notes

It's pretty simple, working fine in a test environment where we altered the base Ceph dashboard to return the number of down mon processes (ceph.num_mons - ceph.num_mons.active) and change from green to yellow if > 0, red if > 1, where the quorum of that cluster would be in danger.

ccocsas avatar Aug 15 '16 19:08 ccocsas

Thank you very much for your contribution, @ccocsas! We'll review it soon and either merge it or get back to you with some feedback!

gmmeyer avatar Aug 19 '16 20:08 gmmeyer

Hey @ccocsas, which version of Ceph are you using?

vagelim avatar Aug 31 '16 13:08 vagelim

Hammer .94.9 (which is probably .94.8, but that's what the dpkg got tagged as in the repositories).

Planning a migration to Jewel.

ccocsas avatar Sep 06 '16 13:09 ccocsas