nomad icon indicating copy to clipboard operation
nomad copied to clipboard

memory_max to take client total memory into consideration with placement

Open dmclf opened this issue 7 months ago • 3 comments

Nomad version

Nomad v1.9.3
BuildDate 2024-11-11T16:35:41Z
Revision d92bf1014886c0ff9f882f4a2691d5ae8ad8131c

Operating system and Environment details

Ubuntu 22

Issue

when defining a job with memory_max that is higher then any client can provide, the job still gets allocated

+/- Job: "job"
+/- Task Group: "group" ( 1 create/destroy update )
+/- Task: "task" ( forces create/destroy update )
+/- Resources {
CPU:	"300"
Cores:	"0"
DiskMB:	"0"
IOPS:	"0"
MemoryMB:	"300"
+/- MemoryMaxMB:	"3000" => "3221225472000"
SecretsMB:	"0"
}

Scheduler dry-run
All tasks successfully allocated.

Reproduction steps

schedule a job with any kind of high memory_max

Expected Result

job placement failure, as I do not have clients with 3221225472000 MB memory (yet) = 3.07 billion EB (Exabytes), or = 3 ZB (Zettabytes)

some kind of 'basic' protection would be nice.

ie, maybe some clients have 128G, some 256G

then a job with a memory_max of 160G should only be placed on the 200G machine, not on the 128G

yes, there are constraints, or base 'memory' values, that can also affect placement, but it would also be nice if Nomad could take this into consideration.

Actual Result

Scheduler dry-run All tasks successfully allocated.

job gets placed fine

# head -n1 /proc/meminfo  
MemTotal:        8091536 kB

docker inspect

# docker inspect a49784f6486d |grep -i memory
            "Memory": 3377699720527872000,
            "MemoryReservation": 314572800,
            "MemorySwap": 3377699720527872000,
            "MemorySwappiness": null,
                "NOMAD_MEMORY_LIMIT=300",
                "NOMAD_MEMORY_MAX_LIMIT=3221225472000",

dmclf avatar Jul 28 '25 03:07 dmclf

Hi @dmclf! That seems like a reasonable thing to do. I've got a draft PR up here: https://github.com/hashicorp/nomad/pull/26383

However, I have a concern that users may be relying on the existing behavior. For example, if you want to make sure that your allocations always have a soft-limit but aren't worried about a hard limit, you might try to set a very high value for resources.memory_max across all your jobs. That doesn't sound like a very good idea, but it represents a backwards compatibility concern at least. I'll get some of my colleagues to check out the PR and we'll have a discussion about it.

(https://github.com/hashicorp/nomad/issues/25939 would give those users an alternative, so maybe this needs to wait until we've done that work?)

tgross avatar Jul 28 '25 19:07 tgross

@tgross thank you!

<2cents>

If you insist on making some backwards compatibility,

maybe consider a value of -1 or such to route it current behavior so people, if they really depend on it, have some option to enforce this ?

at least a value of -1 should also be more likely interpreted as an 'special' flag to the user they are doing something that may not be a good idea.

then if max_value has any positive value, i suppose from my end it would be already sufficient to just check it is within the client 's nomad-allocated memory.

and optionally, find a best fit but that can just be based off the regular memory logic, as memory_max should (in my eyes) be for (temporarily) bursting only, if multiple processes that are allowed to burst beyond and trigger a host-OOM , that would in my eyes be my problem and I wouldn't blame Nomad, but if you want to also consider the amount of jobs with a specific memory_max in some way, that is a decision on your end.

for simplicity just checking that a memory_max can fit at all would be sufficient for my use case, as in my opinion, allowing oversubscription, 1 is manual, and 2: would render you in the realm of proceed at own risk already. </2cents>

dmclf avatar Jul 29 '25 01:07 dmclf

Yeah setting resources.memory_max = -1 is almost certainly how we'd implement https://github.com/hashicorp/nomad/issues/25939 in the jobspec. The only reason we haven't just gone ahead and done that already is there are questions about how it interacts with Enterprise features like quotas. Let me see if I can nudge along the discussion of #25939 a bit; if we decide we don't care about the quota interaction (ex. we document that soft-only limits aren't compatible with quotas or are disallowed if quotas are enabled on the namespace), then the implementation would be pretty trivial and we could knock out this feature and that in one go.

tgross avatar Jul 29 '25 12:07 tgross