Add min_replicas setting to functions
When vacuuming we need to simulate if removing an FE will result in less then min replicas available.
When creating Graphs with functions with min_replicas we need to make sure that we have enough capacity in the cluster to all of the min_replicas for all FEs.
When deploying a new graph version we need to shutdown FEs for the previous version to not 2x capacity for the current version and previous version.
In the future we want min_replicas to be configurable via SDK.
When deploying a new graph version we need to shutdown FEs for the previous version to not 2x capacity for the current version and previous version.
I think the behavior we want could be that when a new version of a graph is deployed, we keep the old version of the graph running as long as there are in flight invocations for the old version. We spin up the new version as well if we have capacity, or we don't bring up the new graph at all when we don't have enough resources.
Users have a mechanism to upgrade pending/in-flight invocations to current version, they can use that to drain invocations off the existing graph version and at that point indexify can automatically bring up the new graph version.
It will take some extra work to implement, but we get consistent behavior around how graph versions behave.
Some more thoughts on this feature -
- We don't bring up new graphs until there is enough capacity for all the replicas of function
- Introduce the notion of priority in graphs. Users can provide priority to different graphs in a namespace, and we always prioritize the higher priority graphs over the lower priority ones when capacity is constrained.
- Across namespace we randomly pick graphs with equal priority when capacity is constrained
- We provide operators - people who are running the indexify clusters (not users) control the priority of graphs through API and server config such that they can decide the priority of some or all graphs. This makes it easy to run multi-tenant clusters where some namespaces and graphs within them are more important than others.
- We introduce State attribute in ComputeGraph - Active, Pending and an attribute Option<PendingReason> which would denote why a compute graph would be in this state. PendingReason could be things like - Broken Graph, No Capacity, BillingRequired, etc - something operators or Indexify can set. This should probably be tracked as a separate feature but I think this is the first point where we need this.