nomad
nomad copied to clipboard
Dispatched job meta parameters in metrics
Hello,
I'm using Nomad to handle various tasks such as services & batch jobs and I feel like a feature is missing for batch jobs : I would like to have Meta parameters as Labels in the Alloc metrics API.
Proposal
In the task runner, there already is a specific case for dispatched jobs : https://github.com/hashicorp/nomad/blob/ff1a30fe8dd4a0da38ddd34915b0be2210d39614/client/allocrunner/taskrunner/task_runner.go#L470-L475
Would it be possible to recursively add meta like
Name: "meta_META_NAME",
Value: meta_value
?
I don't know if having them by default would be heavy, but we may have a Telemetry parameter to activate this feature (like we already have publish_allocation_metrics
).
Use-cases
My Prometheus server is monitoring all my jobs through Nomad Metrics API. I have Grafana dashboards which gives me details about what is happening on the jobs and I have some filters (on parent_id
, exported_job
, alloc_id
, ...) which allows me to select the data I need to see.
I also redacted AlertManager rules to send me alerts when a job fails (and other things).
The purpose would be to be able to filter dispatched jobs based on their meta parameters (and add those details in my alertmanager alerts).
Attempted Solutions
I activated all telemetry configs, read some articles and some Nomad code to see that the feature doesn't seem to exist yet.
Thank you !
Hi @BDelacour! As you've noted, this probably isn't the sort of thing we'd want to do by default, because emitting metrics (particularly ones that can be quite large) can get expensive for cluster administrators. I don't have a solid idea of a good UX for this configuration, but I'm going to mark this for roadmapping and further discussion.