computed class inelligible message is not clear and difficult to debug
Nomad version
1.5.6
Operating system and Environment details
Issue
Constraint computed class ineligible filtered 1 node is not helpful.
Reproduction steps
Here are three completely different ways to reproduce this error message
- https://github.com/hashicorp/nomad/issues/8411
- https://stackoverflow.com/questions/56660651/nomad-job-failed-to-place-all-allocations
- https://discuss.hashicorp.com/t/how-to-find-out-why-a-job-placement-is-failing-with-a-constraint/32209
The 3rd post there lead me to the correct solution, but still required reading between the lines.
Expected Result
A human readable sentence. "Constraint computed class ineligible" is not that.
The docs for attribute suggest that all acceptable values must start with a Nomad Interpolated Value something like /\$\{(node|attr|meta)/ though I suspect it's more complicated than that. A message indicating that attribute = "${platform.aws.placement.availability-zone}" was unacceptable because it did not start with attr would have been better.
Resurrecting this issue a bit with a unit test inspired by a test case provided by @maksimnosal
func TestServiceScheduler_ComputedClassReporting(t *testing.T) {
h := tests.NewHarness(t)
node0 := mock.Node()
node0.Attributes["attr.unique.hostname"] = "foo.example.com"
must.NoError(t, h.State.UpsertNode(structs.MsgTypeTestSetup, h.NextIndex(), node0))
for range 4 {
node := mock.Node()
must.NoError(t, h.State.UpsertNode(structs.MsgTypeTestSetup, h.NextIndex(), node))
}
job := mock.Job()
job.TaskGroups[0].Count = 1
job.TaskGroups[0].Tasks[0].Driver = "docker"
job.Constraints = append(job.Constraints, &structs.Constraint{
LTarget: "${attr.unique.hostname}",
RTarget: "foo.example.com",
Operand: "=",
})
must.NoError(t, h.State.UpsertJob(structs.MsgTypeTestSetup, h.NextIndex(), nil, job))
eval := &structs.Evaluation{
Namespace: structs.DefaultNamespace,
ID: uuid.Generate(),
Priority: job.Priority,
TriggeredBy: structs.EvalTriggerJobRegister,
JobID: job.ID,
Status: structs.EvalStatusPending,
}
must.NoError(t, h.State.UpsertEvals(structs.MsgTypeTestSetup, h.NextIndex(), []*structs.Evaluation{eval}))
err := h.Process(NewServiceScheduler, eval)
must.NoError(t, err)
must.Len(t, 0, h.Plans, must.Sprint("expected no plan"))
must.Len(t, 1, h.Evals)
eval = h.Evals[0]
spew.Dump(eval.FailedTGAllocs)
}
Output:
=== RUN TestServiceScheduler_ComputedClassReporting
...
(map[string]*structs.AllocMetric) (len=1) {
(string) (len=3) "web": (*structs.AllocMetric)(0xc0003fec00)({
NodesEvaluated: (int) 5,
NodesFiltered: (int) 5,
NodesInPool: (int) 5,
NodePool: (string) (len=7) "default",
NodesAvailable: (map[string]int) (len=1) {
(string) (len=3) "dc1": (int) 5
},
ClassFiltered: (map[string]int) (len=1) {
(string) (len=16) "linux-medium-pci": (int) 5
},
ConstraintFiltered: (map[string]int) (len=1) {
(string) (len=41) "${attr.unique.hostname} = foo.example.com": (int) 5
},
NodesExhausted: (int) 0,
ClassExhausted: (map[string]int) <nil>,
DimensionExhausted: (map[string]int) <nil>,
QuotaExhausted: ([]string) <nil>,
ResourcesExhausted: (map[string]*structs.Resources) <nil>,
Scores: (map[string]float64) <nil>,
ScoreMetaData: ([]*structs.NodeScoreMeta) <nil>,
nodeScoreMeta: (*structs.NodeScoreMeta)(<nil>),
topScores: (*kheap.ScoreHeap)(<nil>),
AllocationTime: (time.Duration) 2.591µs,
CoalescedFailures: (int) 0
})
}
--- PASS: TestServiceScheduler_ComputedClassReporting (0.00s)
PASS
ok github.com/hashicorp/nomad/scheduler 0.008s
See also https://github.com/hashicorp/nomad/issues/13898 https://github.com/hashicorp/nomad/issues/11228 https://github.com/hashicorp/nomad/issues/8411 for other examples of how painful this is.