RFC: Metafora v2 - Leader based task assignment; task dependency metadata
Impetus
Metafora's current peer-based work stealing approach has limitations:
- In a cluster with nodes A and B, each with 10 tasks, if you start node C, any tasks released (due to the fair balancer) from A or B, are just as likely to be claimed by A or B as C! The
CanClaimmethod alone offers very few tools to mitigate this. - Starting a cluster is extremely resource intensive as each node often tries to claim every task.
New Scheduling
- Use something similar to the current work-stealing scheduling to elect a scheduling leader. Have that leader assign or offer tasks to followers.
- Task metadata used for scheduling decisions (affinities, anti-affinities, dependencies, resource utilization) could be declarative definitions or a tasks could implement a function that given an
Offerdetermines whether or not it's sufficient for the task to run. (assignment vs. ask/offer)
Open Questions
- How simple and extensible can we keep metafora and still address scheduler scaling issues?
- What metadata and constraints are valuable for scheduling decisions?
Topologies
- Metafora doesn't have them currently. Period.
- Super complicated stuff, especially if you want to generically handle inter-task communication.
Open Questions
- Leave topologies up to another library/layer?
- Treat topologies as scheduling metadata and leave inter-task communication up to users?
These are my personal preferences as I'd rather not try to compete with existing one-size-fits-all topology frameworks like Storm. -- @schmichael
My 2¢ on the topology topic is that there are already some very nice go-lang libraries to form those. Both with brokers and without, like NATS or MANGOS. I don't think metafora proper needs to be in that business.
I think it would be tremendous to have this idea of a leader scheduler, that makes the schedule. It's first implementation could be quite simple, and then it could add new features as needed. I think just having this scheduler so that these "large land grabs" could be totally avoided would already be tremendous, and more usable by a larger user base.
couple thoughts:
- I like the flexibility and how kubernetes uses labels for scheduling metadata
- is this eventually going to be dependent on kubernetes? ie, can we assume there is a
processscheduler underneath of it that can pass through additional labels? what differentiates their responsibilities?
@araddon at least at first I'd like to avoid a hard dependency on k8s as that would be a pretty huge dependency. I should definitely get familiar with their scheduler behavior first though to make sure.
Now I think k8s might make a great source for the scheduler to learn about cluster resources. Maybe someday the scheduler could gain some sort of k8s plugin/sidecar/something that it could use for autoscaling when resources aren't available, but that seems like an easy thing to build as an optional component down the road.