nomad icon indicating copy to clipboard operation
nomad copied to clipboard

persist reconciler metrics in raft

Open tgross opened this issue 2 years ago • 0 comments
trafficstars

As noted in Architecture: Eval Lifecycle, the scheduler has 3 phases: reconciling, feasibility checking, and scoring. When a plan is submitted, it includes metrics for feasibility checking and scoring, but the only data we get for the reconciliation step is emitted in debug-level logs. This forces cluster administrators run with debug-level logging if they want any visibility into what the reconciler has done (especially around stopping allocations), which can increase operating costs.

We want to expose metrics from the reconcile step in the plan results so they're persisted in raft like the rest of the plan. I'm opening this issue to discuss with the team (and community!) what metrics we should try to collect.

tgross avatar Dec 16 '22 15:12 tgross