Improve scheduling latency
The scheduler is quite naive, it tries to allocate every unallocated task whenever the following things happen on the cluster -
- New executor joins
- Existing executor leaves
- New tasks are created
- Allocations fail or succeed
- FEs fail.
We need to make the scheduler figure out the least amount of work it can do to serve these events. When we receive a new graph, we need to figure out what type of executors it can fit, and then we create a reverse index of executor type -> functions. For (1) and (2) we can figure out the functions that can be scheduled, and so we can try to allocate only functions that matches those executor types. We need to also do early stopping, where once the executors are filled up there is no point in trying to allocate more.
The remaining types of events can use the same infrastructure.
We should also send out metrics for deficits of executors by "node type" so that the autoscaler can bring these machines up.
We already have executor catalog now, so this allows us to use the catalog entries as the "node types" and reverse indexes can be created for each catalog entry
Build a reverse index from executor-type (catalog entry) → set of function IDs that can run on that type. Rebuild / update it whenever catalog or function metadata changes.
cam help you do this if possible, allow me to work on this