metafora RFC: Metafora v2 - Leader based task assignment; task dependency metadata

Impetus

Metafora's current peer-based work stealing approach has limitations:

In a cluster with nodes A and B, each with 10 tasks, if you start node C, any tasks released (due to the fair balancer) from A or B, are just as likely to be claimed by A or B as C! The CanClaim method alone offers very few tools to mitigate this.
Starting a cluster is extremely resource intensive as each node often tries to claim every task.

New Scheduling

Use something similar to the current work-stealing scheduling to elect a scheduling leader. Have that leader assign or offer tasks to followers.
Task metadata used for scheduling decisions (affinities, anti-affinities, dependencies, resource utilization) could be declarative definitions or a tasks could implement a function that given an Offer determines whether or not it's sufficient for the task to run. (assignment vs. ask/offer)

Open Questions

How simple and extensible can we keep metafora and still address scheduler scaling issues?
What metadata and constraints are valuable for scheduling decisions?

Topologies

Metafora doesn't have them currently. Period.
Super complicated stuff, especially if you want to generically handle inter-task communication.

Open Questions

Leave topologies up to another library/layer?
Treat topologies as scheduling metadata and leave inter-task communication up to users?

These are my personal preferences as I'd rather not try to compete with existing one-size-fits-all topology frameworks like Storm. -- @schmichael

Jul 23 '15 17:07 schmichael

My 2¢ on the topology topic is that there are already some very nice go-lang libraries to form those. Both with brokers and without, like NATS or MANGOS. I don't think metafora proper needs to be in that business.

I think it would be tremendous to have this idea of a leader scheduler, that makes the schedule. It's first implementation could be quite simple, and then it could add new features as needed. I think just having this scheduler so that these "large land grabs" could be totally avoided would already be tremendous, and more usable by a larger user base.

Jul 23 '15 17:07 mdmarek

couple thoughts:

I like the flexibility and how kubernetes uses labels for scheduling metadata
is this eventually going to be dependent on kubernetes? ie, can we assume there is a process scheduler underneath of it that can pass through additional labels? what differentiates their responsibilities?

Jul 23 '15 17:07 araddon

@araddon at least at first I'd like to avoid a hard dependency on k8s as that would be a pretty huge dependency. I should definitely get familiar with their scheduler behavior first though to make sure.

Now I think k8s might make a great source for the scheduler to learn about cluster resources. Maybe someday the scheduler could gain some sort of k8s plugin/sidecar/something that it could use for autoscaling when resources aren't available, but that seems like an easy thing to build as an optional component down the road.

Jul 23 '15 18:07 schmichael