nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Dispatched job meta parameters in metrics

Open BDelacour opened this issue 2 years ago • 1 comments

Hello,

I'm using Nomad to handle various tasks such as services & batch jobs and I feel like a feature is missing for batch jobs : I would like to have Meta parameters as Labels in the Alloc metrics API.

Proposal

In the task runner, there already is a specific case for dispatched jobs : https://github.com/hashicorp/nomad/blob/ff1a30fe8dd4a0da38ddd34915b0be2210d39614/client/allocrunner/taskrunner/task_runner.go#L470-L475

Would it be possible to recursively add meta like

 		Name:  "meta_META_NAME", 
 		Value: meta_value

?

I don't know if having them by default would be heavy, but we may have a Telemetry parameter to activate this feature (like we already have publish_allocation_metrics).

Use-cases

My Prometheus server is monitoring all my jobs through Nomad Metrics API. I have Grafana dashboards which gives me details about what is happening on the jobs and I have some filters (on parent_id, exported_job, alloc_id, ...) which allows me to select the data I need to see. I also redacted AlertManager rules to send me alerts when a job fails (and other things).

The purpose would be to be able to filter dispatched jobs based on their meta parameters (and add those details in my alertmanager alerts).

Attempted Solutions

I activated all telemetry configs, read some articles and some Nomad code to see that the feature doesn't seem to exist yet.

Thank you !

BDelacour avatar Oct 03 '22 13:10 BDelacour

Hi @BDelacour! As you've noted, this probably isn't the sort of thing we'd want to do by default, because emitting metrics (particularly ones that can be quite large) can get expensive for cluster administrators. I don't have a solid idea of a good UX for this configuration, but I'm going to mark this for roadmapping and further discussion.

tgross avatar Oct 03 '22 14:10 tgross