nomad icon indicating copy to clipboard operation
nomad copied to clipboard

computed class inelligible message is not clear and difficult to debug

Open josh-m-sharpe opened this issue 2 years ago • 1 comments

Nomad version

1.5.6

Operating system and Environment details

Issue

Constraint computed class ineligible filtered 1 node is not helpful.

Reproduction steps

Here are three completely different ways to reproduce this error message

  1. https://github.com/hashicorp/nomad/issues/8411
  2. https://stackoverflow.com/questions/56660651/nomad-job-failed-to-place-all-allocations
  3. https://discuss.hashicorp.com/t/how-to-find-out-why-a-job-placement-is-failing-with-a-constraint/32209

The 3rd post there lead me to the correct solution, but still required reading between the lines.

Expected Result

A human readable sentence. "Constraint computed class ineligible" is not that.

The docs for attribute suggest that all acceptable values must start with a Nomad Interpolated Value something like /\$\{(node|attr|meta)/ though I suspect it's more complicated than that. A message indicating that attribute = "${platform.aws.placement.availability-zone}" was unacceptable because it did not start with attr would have been better.

josh-m-sharpe avatar Jun 06 '23 00:06 josh-m-sharpe

Resurrecting this issue a bit with a unit test inspired by a test case provided by @maksimnosal


func TestServiceScheduler_ComputedClassReporting(t *testing.T) {

	h := tests.NewHarness(t)
	node0 := mock.Node()
	node0.Attributes["attr.unique.hostname"] = "foo.example.com"
	must.NoError(t, h.State.UpsertNode(structs.MsgTypeTestSetup, h.NextIndex(), node0))

	for range 4 {
		node := mock.Node()
		must.NoError(t, h.State.UpsertNode(structs.MsgTypeTestSetup, h.NextIndex(), node))
	}

	job := mock.Job()
	job.TaskGroups[0].Count = 1
	job.TaskGroups[0].Tasks[0].Driver = "docker"
	job.Constraints = append(job.Constraints, &structs.Constraint{
		LTarget: "${attr.unique.hostname}",
		RTarget: "foo.example.com",
		Operand: "=",
	})
	must.NoError(t, h.State.UpsertJob(structs.MsgTypeTestSetup, h.NextIndex(), nil, job))

	eval := &structs.Evaluation{
		Namespace:   structs.DefaultNamespace,
		ID:          uuid.Generate(),
		Priority:    job.Priority,
		TriggeredBy: structs.EvalTriggerJobRegister,
		JobID:       job.ID,
		Status:      structs.EvalStatusPending,
	}

	must.NoError(t, h.State.UpsertEvals(structs.MsgTypeTestSetup, h.NextIndex(), []*structs.Evaluation{eval}))
	err := h.Process(NewServiceScheduler, eval)
	must.NoError(t, err)

	must.Len(t, 0, h.Plans, must.Sprint("expected no plan"))
	must.Len(t, 1, h.Evals)

	eval = h.Evals[0]
	spew.Dump(eval.FailedTGAllocs)
}

Output:

=== RUN   TestServiceScheduler_ComputedClassReporting
...
(map[string]*structs.AllocMetric) (len=1) {
 (string) (len=3) "web": (*structs.AllocMetric)(0xc0003fec00)({
  NodesEvaluated: (int) 5,
  NodesFiltered: (int) 5,
  NodesInPool: (int) 5,
  NodePool: (string) (len=7) "default",
  NodesAvailable: (map[string]int) (len=1) {
   (string) (len=3) "dc1": (int) 5
  },
  ClassFiltered: (map[string]int) (len=1) {
   (string) (len=16) "linux-medium-pci": (int) 5
  },
  ConstraintFiltered: (map[string]int) (len=1) {
   (string) (len=41) "${attr.unique.hostname} = foo.example.com": (int) 5
  },
  NodesExhausted: (int) 0,
  ClassExhausted: (map[string]int) <nil>,
  DimensionExhausted: (map[string]int) <nil>,
  QuotaExhausted: ([]string) <nil>,
  ResourcesExhausted: (map[string]*structs.Resources) <nil>,
  Scores: (map[string]float64) <nil>,
  ScoreMetaData: ([]*structs.NodeScoreMeta) <nil>,
  nodeScoreMeta: (*structs.NodeScoreMeta)(<nil>),
  topScores: (*kheap.ScoreHeap)(<nil>),
  AllocationTime: (time.Duration) 2.591µs,
  CoalescedFailures: (int) 0
 })
}
--- PASS: TestServiceScheduler_ComputedClassReporting (0.00s)
PASS
ok      github.com/hashicorp/nomad/scheduler    0.008s

See also https://github.com/hashicorp/nomad/issues/13898 https://github.com/hashicorp/nomad/issues/11228 https://github.com/hashicorp/nomad/issues/8411 for other examples of how painful this is.

tgross avatar Sep 29 '25 19:09 tgross