cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Validate compute requests with the assumption that all APIs are maxed out

Open RobertLucian opened this issue 3 years ago • 1 comments

Description

When deploying a new API, we only validate against the scheduled workloads on the cluster. We're not taking into consideration the situation where all of the existing APIs are scaled up to their max number of replicas.

What this can lead to is a situation where APIs can't scale up from their existing number of replicas because the cluster was overcommitted.

Solution

We can also provide an additional flag --no-replica-guarantee to the deploy command to allow an API to not be guaranteed. OTAH, APIs that don't use the flag will be guaranteed their max number of replicas. We can achieve that by using priority classes https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass.

This doesn't lead the compute resources to be wasted.

RobertLucian avatar Mar 16 '21 14:03 RobertLucian

I think the behavior it's fine as it is and it will probably create more headaches than advantages. It is also consistent with the Kubernetes resources limits behavior.

miguelvr avatar Mar 16 '21 14:03 miguelvr