vigil
vigil copied to clipboard
Disk usage reporting
Disk space usage seems to be a really basic metric to probe.
Some of services are often victims of full disk and that risk could be mitigated by allowing Vigil to report such situations as unhealthy above some percentage of disk usage.
Probably worth handling that?
This would only apply to "active" (eg. push
probes then), while passive poll
probes will not be able to acquire any system information.
I'm open to PRs if you're willing to add that feature!
This would only apply to "active" (eg.
push
probes then), while passivepoll
probes will not be able to acquire any system information.
Yes. But tools like django-health-check
can enhance polling radar.
I'm open to PRs if you're willing to add that feature!
Yes I will make a PR but can you please let me know your thoughts on generalising this feature a bit. So that Vigil accepts metrics
list in incoming JSON, which could look like:
"metrics": [
{
"label": "Disk usage in /",
"value": "50",
"max": "90",
"unit": "%"
},
{
"label": "Idle workers",
"value": "7",
"min": "2"
},
{
"label": "Available RAM",
"value": "190",
"min": "50",
"unit": "MB"
},
{
"label": "Time since last backup",
"value": "4",
"max": "24",
"unit": "h"
},
]
No, the metrics object needs to input data that's known in advance to Sonic, generic data is not planned to be supported as Vigil cannot act easily on that in order to show status colors.
Check: https://github.com/valeriansaliou/node-vigil-reporter/blob/master/lib/vigil_reporter.js#L176 for data format.
Are you still interested in PRs for disk usage? I may be able to help.
Yes! That would be an additional load metric reported by a Vigil Reporter library to Vigil; as disk usage cannot be probed by Vigil itself right away.
Shouldn't Vigil reporters and the API have a versioning system? This way it's easier to add new features without breaking retrocompatibility with older servers.
Something like version: 1
in the same JSON POST.
Otherwise Vigil Server would kinda have to guess and ignore blank fields.
Also: Should we make this one storage metric per mountpoint? Maybe something like a dictionary of mountpoints with their storage reported as a percentage? And Vigil can be triggered whenever any of them exceed 90%, for example. Configurable, of course.
IMHO this kind of server specific checks tests could be exposed as a http endpoint through goss running in server mode testing for mount. This server tests endpoint could be probed externally using http services monitor getting red on non 200. This way the things are decoupled working in the complementary mode and both the frameworks are doing their respective functions for which they are actually created.