cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Relocate API replicas when other API replicas of a higher priority are getting scheduled

Open RobertLucian opened this issue 3 years ago • 0 comments

Description

Assume we have these node groups:

  • A CPU node group with a live instance. This can fit 10 iris-classifier replicas (which only request CPU). Max_instances is set to 1.
  • A GPU node group with a live instance. This can fit the one replica of the text-generator (which requests a GPU) example or 10 iris-classifier replicas (which only request CPU). Max_instances is set to 1.

Let's say we deploy an iris classifier and we increase its number of replicas to say 10. For the sake of the example, let's say that 5 replicas end up on the CPU node group and 5 of them end up on the GPU node group. When the text generator is deployed, it won't get scheduled anywhere, because the iris-classifier pods have taken up the CPU/mem resources.

Solution

The solution is to relocate the iris-classifier pods from the GPU node group onto the CPU node group. This can be achieved with a combination of pod priorities and a preemption strategy: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass. Once they are relocated, the text-generator replica can initialize.

A concern about this is that it needs to be verified that while the relocation process has kicked off, the text-generator replica gets scheduled because otherwise, the cluster-autoscaler will step in and add a new node (not possible in this case though).

Alternative example

Take the same above scenario, but increase the max_instance for the CPU nodegroup to 2. This time, there are 15 iris-classifier replicas deployed, with 5 of them sitting on the GPU node group. We know that with 5 of them on the GPU node, the scheduling of a text-generator replica is not possible.

Now, the text-generator is deployed with a single replica. The relocation process is kicked off again, but this time, the text-generator needs to wait on the CPU nodegroup node to be brought up too. This adds some delay, which is totally okay.

This case can happen when https://github.com/cortexlabs/cortex/issues/1965 is addressed.

RobertLucian avatar Mar 16 '21 15:03 RobertLucian