cortex
cortex copied to clipboard
Validate compute requests with the assumption that all APIs are maxed out
Description
When deploying a new API, we only validate against the scheduled workloads on the cluster. We're not taking into consideration the situation where all of the existing APIs are scaled up to their max number of replicas.
What this can lead to is a situation where APIs can't scale up from their existing number of replicas because the cluster was overcommitted.
Solution
We can also provide an additional flag --no-replica-guarantee
to the deploy command to allow an API to not be guaranteed. OTAH, APIs that don't use the flag will be guaranteed their max number of replicas. We can achieve that by using priority classes https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass.
This doesn't lead the compute resources to be wasted.
I think the behavior it's fine as it is and it will probably create more headaches than advantages. It is also consistent with the Kubernetes resources limits behavior.