scheduler: feasibility check that memory_max fits in total
The resource.memory_max field is intended to allow memory oversubscription, so we don't check it in the AllocsFit method where we're totalling up all the request memory for all allocs on a node. But we never check that the value can even fit in the maximum amount of memory on the node, which can result in nonsensical placements.
When iterating nodes in the feasibility check phase, check that the memory_max field doesn't exceed the total amount of memory on the node. Note that this deliberately ignores over "reserved memory", as the feature is intended to allow oversubscription.
Fixes: https://github.com/hashicorp/nomad/issues/26360
Configure oversubscription and check the memory available:
$ nomad node status -self -json | jq '.NodeResources.Memory.MemoryMB'
31529
$ nomad operator scheduler set-config -memory-oversubscription=true
Scheduler configuration updated!
Specify a job with memory_max that exceeds that memory:
jobspec
job "example" {
group "group" {
task "task" {
driver = "docker"
config {
image = "busybox:1"
command = "httpd"
args = ["-vv", "-f", "-p", "8001", "-h", "/local"]
}
resources {
cpu = 100
memory = 100
memory_max = 32000
}
}
}
}
Plan the job:
$ nomad job plan ./example.nomad.hcl
+/- Job: "example"
+/- Stop: "true" => "false"
Task Group: "group" (1 create)
Task: "task"
Scheduler dry-run:
- WARNING: Failed to place all allocations.
Task Group "group" (failed to place 1 allocation):
* Constraint "task memory_max exceeds maximum available memory": 1 nodes excluded by filter
Job Modify Index: 18
To submit the job with version verification run:
nomad job run -check-index 18 ./jobs/minimal.nomad.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
Update to memory_max = 200, or simply remove the memory_max config:
$ nomad job plan ./jobs/minimal.nomad.hcl
+ Job: "example"
+ Task Group: "group" (1 create)
+ Task: "task" (forces create)
Scheduler dry-run:
- All tasks successfully allocated.
Job Modify Index: 0
To submit the job with version verification run:
nomad job run -check-index 0 ./jobs/minimal.nomad.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
Contributor Checklist
- [ ] Changelog Entry If this PR changes user-facing behavior, please generate and add a
changelog entry using the
make clcommand. - [x] Testing Please add tests to cover any new functionality or to demonstrate bug fixes and ensure regressions will be caught.
- [x] Documentation If the change impacts user-facing functionality such as the CLI, API, UI, and job configuration, please update the Nomad website documentation to reflect this. Refer to the website README for docs guidelines. Please also consider whether the change requires notes within the upgrade guide.
Reviewer Checklist
- [ ] Backport Labels Please add the correct backport labels as described by the internal backporting document.
- [ ] Commit Type Ensure the correct merge method is selected which should be "squash and merge" in the majority of situations. The main exceptions are long-lived feature branches or merges where history should be preserved.
- [ ] Enterprise PRs If this is an enterprise only PR, please add any required changelog entry within the public repository.