vigil Disk usage reporting

Disk space usage seems to be a really basic metric to probe.

Some of services are often victims of full disk and that risk could be mitigated by allowing Vigil to report such situations as unhealthy above some percentage of disk usage.

Probably worth handling that?

Jun 25 '19 09:06 grucha

This would only apply to "active" (eg. push probes then), while passive poll probes will not be able to acquire any system information.

I'm open to PRs if you're willing to add that feature!

Jun 25 '19 14:06 valeriansaliou

This would only apply to "active" (eg. push probes then), while passive poll probes will not be able to acquire any system information.

Yes. But tools like django-health-check can enhance polling radar.

I'm open to PRs if you're willing to add that feature!

Yes I will make a PR but can you please let me know your thoughts on generalising this feature a bit. So that Vigil accepts metrics list in incoming JSON, which could look like:

"metrics": [
    {
        "label": "Disk usage in /",
        "value": "50",
        "max": "90",
        "unit": "%"
    },
    {
        "label": "Idle workers",
        "value": "7",
        "min": "2"
    },
    {
        "label": "Available RAM",
        "value": "190",
        "min": "50",
        "unit": "MB"
    },
    {
        "label": "Time since last backup",
        "value": "4",
        "max": "24",
        "unit": "h"
    },
]

Jul 06 '19 07:07 grucha

No, the metrics object needs to input data that's known in advance to Sonic, generic data is not planned to be supported as Vigil cannot act easily on that in order to show status colors.

Check: https://github.com/valeriansaliou/node-vigil-reporter/blob/master/lib/vigil_reporter.js#L176 for data format.

Jul 08 '19 06:07 valeriansaliou

Are you still interested in PRs for disk usage? I may be able to help.

Apr 10 '20 20:04 L1Cafe

Yes! That would be an additional load metric reported by a Vigil Reporter library to Vigil; as disk usage cannot be probed by Vigil itself right away.

Apr 11 '20 10:04 valeriansaliou

Shouldn't Vigil reporters and the API have a versioning system? This way it's easier to add new features without breaking retrocompatibility with older servers.

Something like version: 1 in the same JSON POST.

Otherwise Vigil Server would kinda have to guess and ignore blank fields.

Also: Should we make this one storage metric per mountpoint? Maybe something like a dictionary of mountpoints with their storage reported as a percentage? And Vigil can be triggered whenever any of them exceed 90%, for example. Configurable, of course.

Apr 11 '20 10:04 L1Cafe

IMHO this kind of server specific checks tests could be exposed as a http endpoint through goss running in server mode testing for mount. This server tests endpoint could be probed externally using http services monitor getting red on non 200. This way the things are decoupled working in the complementary mode and both the frameworks are doing their respective functions for which they are actually created.

Oct 07 '20 19:10 richnusgeeks

vigil vigil copied to clipboard

Disk usage reporting

vigil
vigil copied to clipboard