nomad
nomad copied to clipboard
persist reconciler metrics in raft
As noted in Architecture: Eval Lifecycle, the scheduler has 3 phases: reconciling, feasibility checking, and scoring. When a plan is submitted, it includes metrics for feasibility checking and scoring, but the only data we get for the reconciliation step is emitted in debug-level logs. This forces cluster administrators run with debug-level logging if they want any visibility into what the reconciler has done (especially around stopping allocations), which can increase operating costs.
We want to expose metrics from the reconcile step in the plan results so they're persisted in raft like the rest of the plan. I'm opening this issue to discuss with the team (and community!) what metrics we should try to collect.