nomad
nomad copied to clipboard
worker.service_sched: processing eval panicked scheduler - please report this as a bug
Nomad version
Nomad v1.7.5 BuildDate 2024-02-13T15:10:13Z Revision 5f5d4646198d09b8f4f6cb90fb5d50b53fa328b8
Operating system and Environment details
RHEL 9.3
Issue
Evaluations for the job are failing:
{
"priority": 50,
"type": "service",
"triggeredBy": "failed-follow-up",
"status": "failed",
"statusDescription": "evaluation reached delivery limit (3)",
"failedTGAllocs": [],
"previousEval": "34ab318b-a04d-a62b-48cc-604e265e4573",
"nextEval": "15f7b8cf-8091-5abf-f3ed-28517af63b7a",
"blockedEval": null,
"modifyIndex": 39197810,
"modifyTime": "2024-03-18T15:41:53.513Z",
"createIndex": 39197798,
"createTime": "2024-03-18T15:38:30.948Z",
"waitUntil": null,
"namespace": "default",
"plainJobId": "exec-job",
"relatedEvals": [
"15f7b8cf-8091-5abf-f3ed-28517af63b7a",
"34ab318b-a04d-a62b-48cc-604e265e4573",
"70e22606-6d2c-b44f-8062-b3a7b5f7ca69",
"6fa29ce2-9e16-2420-039e-7b5f8a4cd466",
"9f1bfb6c-4984-9b6d-384e-2defd5f1a574",
"7d362010-c877-74c9-56fc-b7b842688409",
"cc10963b-9de1-8d6b-ec1c-eaabf0f3497a",
"6503ee4a-5b86-d742-4597-cea96f18582e"
],
"job": "[\"exec-job\",\"default\"]",
"node": null
}
Reproduction steps
Nomad cluster was updated to 1.7.5
Expected Result
Jobs are evaluated and running
Actual Result
Jobs are never started
Job file (if appropriate)
Pretty much any job won't start.
Nomad Server logs (if appropriate)
2024-03-18T15:25:14.668Z [ERROR] worker.service_sched: processing eval panicked scheduler - please report this as a bug!: eval_id=9f1bfb6c-4984-9b6d-384e-2defd5f1a574 job_id=exec-job namespace=default worker_id=0c9215c7-515a-eb81-7b10-a11f8abda944 eval_id=9f1bfb6c-4984-9b6d-384e-2defd5f1a574 error="runtime error: invalid memory address or nil pointer dereference"
stack_trace=
| goroutine 83 [running]:
| runtime/debug.Stack()
| \truntime/debug/stack.go:24 +0x5e
| github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process.func1()
| \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:153 +0x58
| panic({0x2a88140?, 0x4f5ea50?})
| \truntime/panic.go:914 +0x21f
| github.com/hashicorp/nomad/client/lib/numalib.(*Topology).UsableCores(...)
| \tgithub.com/hashicorp/nomad/client/lib/numalib/topology.go:258
| github.com/hashicorp/nomad/nomad/structs.(*NodeResources).Comparable(0xc001108c80)
| \tgithub.com/hashicorp/nomad/nomad/structs/structs.go:3185 +0xcc
| github.com/hashicorp/nomad/scheduler.(*Preemptor).SetNode(0xc0029c48f0, 0xc00cc18000)
| \tgithub.com/hashicorp/nomad/scheduler/preemption.go:139 +0x36
| github.com/hashicorp/nomad/scheduler.(*BinPackIterator).Next(0xc00c176a80)
| \tgithub.com/hashicorp/nomad/scheduler/rank.go:274 +0x74d
| github.com/hashicorp/nomad/scheduler.(*JobAntiAffinityIterator).Next(0xc00b367bd0)
| \tgithub.com/hashicorp/nomad/scheduler/rank.go:624 +0x6b
| github.com/hashicorp/nomad/scheduler.(*NodeReschedulingPenaltyIterator).Next(0xc00e4384e0)
| \tgithub.com/hashicorp/nomad/scheduler/rank.go:685 +0x28
| github.com/hashicorp/nomad/scheduler.(*NodeAffinityIterator).Next(0xc00b367c20)
| \tgithub.com/hashicorp/nomad/scheduler/rank.go:757 +0x30
| github.com/hashicorp/nomad/scheduler.(*SpreadIterator).Next(0xc00c176af0)
| \tgithub.com/hashicorp/nomad/scheduler/spread.go:131 +0x33
| github.com/hashicorp/nomad/scheduler.(*PreemptionScoringIterator).Next(0xc02e7cace0)
| \tgithub.com/hashicorp/nomad/scheduler/rank.go:852 +0x28
| github.com/hashicorp/nomad/scheduler.(*ScoreNormalizationIterator).Next(0xc02e7cad20)
| \tgithub.com/hashicorp/nomad/scheduler/rank.go:816 +0x28
| github.com/hashicorp/nomad/scheduler.(*LimitIterator).nextOption(0xc008a79aa0)
| \tgithub.com/hashicorp/nomad/scheduler/select.go:63 +0x24
| github.com/hashicorp/nomad/scheduler.(*LimitIterator).Next(0xc008a79aa0)
| \tgithub.com/hashicorp/nomad/scheduler/select.go:42 +0x26
| github.com/hashicorp/nomad/scheduler.(*MaxScoreIterator).Next(0xc00e438570)
| \tgithub.com/hashicorp/nomad/scheduler/select.go:105 +0x3e
| github.com/hashicorp/nomad/scheduler.(*GenericStack).Select(0xc0262d92b0, 0xc00c062b40, 0xc0029c5530)
| \tgithub.com/hashicorp/nomad/scheduler/stack.go:192 +0xe8f
| github.com/hashicorp/nomad/scheduler.(*GenericScheduler).selectNextOption(0xc00985c000, 0x38264a0?, 0xc0029c5530)
| \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:898 +0x2d
| github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computePlacements(0xc00985c000, {0x526ef20, 0x0, 0x0}, {0xc00b5c5740, 0x1, 0x1}, 0x0?)
| \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:602 +0xa47
| github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computeJobAllocs(0xc00985c000)
| \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:469 +0x14da
| github.com/hashicorp/nomad/scheduler.(*GenericScheduler).process(0xc00985c000)
| \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:289 +0x49a
| github.com/hashicorp/nomad/scheduler.retryMax(0x5, 0xc0029c5d20, 0xc0029c5d10)
| \tgithub.com/hashicorp/nomad/scheduler/util.go:96 +0x49
| github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process(0xc00985c000, 0xc01c1d7680)
| \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:188 +0x55f
| github.com/hashicorp/nomad/nomad.(*Worker).invokeScheduler(0xc008e70ee0, 0xc0110d1e60, 0xc01c1d7680, {0xc02005da10, 0x24})
| \tgithub.com/hashicorp/nomad/nomad/worker.go:634 +0x353
| github.com/hashicorp/nomad/nomad.(*Worker).run(0xc008e70ee0, 0x12a05f200)
| \tgithub.com/hashicorp/nomad/nomad/worker.go:463 +0x5a5
| created by github.com/hashicorp/nomad/nomad.(*Worker).Start in goroutine 1
| \tgithub.com/hashicorp/nomad/nomad/worker.go:162 +0x59
Nomad Client logs (if appropriate)
N/A
@dpotapov what version of Nomad are you upgrading from?
And can you describe more about the runtime environment (like are you running clients in a VM? or what architecture? etc.)
from v1.1.4 servers and clients are amd64 VMs
I guess updating the nomad version on client should help...