nomad icon indicating copy to clipboard operation
nomad copied to clipboard

On migrate block add option to keep old allocation until new healthy one available

Open cberescu opened this issue 7 months ago • 2 comments

Proposal

Add a new param in the migrate block that will keep the old job allocations until the new ones are healthy

Use-cases

When draining nodes allocations should be kept until new ones are healthy. This already works if there are multiple allocations on the node but not if there is only one allocation.

Attempted Solutions

Nothing found.

cberescu avatar Nov 28 '23 13:11 cberescu

Hi @cberescu and thanks for raising this issue. This makes sense as something to achieve, however, I think there will require some investigation into the reconciliation process in order to understand what is possible here and what needs to change. I'll mark this for roadmapping.

jrasell avatar Nov 29 '23 10:11 jrasell

I'll add my use case to this as well.

We have a less critical environment where we allow our services in Nomad to auto-scale from 1 instance up to 4 based on simple metrics like CPU and memory usage. A lot of the services sit around 1 instance even when they are getting constant, routine traffic.

When a migration is kicked off, a service that is being actively hit with requests every few seconds can encounter a lot of seconds of downtime while the single instance migrates to another node.

Auto-scaling would be much more helpful for us in this environment if migrations ensured instance counts remained in the scaling min and max thresholds. Don't let the service instance count drop to 0 healthy allocations when we could have burst to 2 allocations temporarily to ensure we didn't disrupt the original, single healthy allocation until the new one on the other node was ready.

So for us, a new setting in the migrate block to allow this would be great if Nomad isn't tweaked to just allow this behavior by default.

sluebbert avatar Apr 09 '24 16:04 sluebbert