Cook
Cook copied to clipboard
Cook pools design
This is a discussion for the new Cook Scheduler feature called a "pool".
Motivation
We want to support scheduling jobs on heterogeneous clusters (e.g. some preemptible machines, some non-preemptible machines) with a single Cook instance while preserving the independence of scheduling provided by separate ranking and matching cycles for each of these types of machines.
Overview
Users will be able to submit jobs to a to a pre-defined set of pools by adding a field on the job specification:
"pool": "non-preemptible"
Additionally, Mesos agents could be tagged with the pool that they support scheduling on:
"pool": "non-preemptible"
Internally, Cook will schedule jobs independently between pools. Each pool will support different quotas, shares, etc. Jobs in different pools will be ranked and matched separately.
Implementation Details
Internally, Cook will run separate ranking and matching loops for each pool. The jobs and offers will be broken down by pool to speed matching and improve resiliency. For instance, if one pool is not receiving offers for an extended period and has a large number of jobs queued, it will not adversely impact scheduling on other pools. A separate Fenzo matcher will be started for each pool to avoid the need to synchronize on access to a shared Fenzo resource.
Decision Record
Title | Context | Decision Made By | Decision Made | Reason |
---|---|---|---|---|
Pools Definition | How are pools defined? | @dposada @pschorf | Pools will be pre-defined at startup in Datomic, and cannot be added or removed without a restart. | We don't need to make pools more dynamic than this, and this is simpler to implement. |
Job Categories | What happens to the existing "normal" vs "gpu" categorization? | @pschorf @dposada | The existing categories of "normal" jobs and "gpu" jobs will be changed to the new pools approach discussed here. | It should be a natural generalization of categories into the new pools approach. |
Pools Configuration | What should the pool configuration look like? | @dposada @pschorf | ... :pools {:default "gamma"} ... |
All of the other metadata about pools is stored in Datomic, so the only piece of information that we need in the config is the name of the default pool. |
API Defaults | What should we return if no pool argument is passed? |
@pschorf @scrosby @dposada | For /list and /jobs , we will return all jobs, regardless of pool. For /usage , /share , and /quota , we will return the default pool's data in the top-level keys, but we will add a sub-map containing the data for all pools. |
It makes a lot of sense to continue to give people all of their jobs in the cluster when they call /list or /jobs , and it doesn't make much sense to aggregate quotas, shares, or usages across pools. |
Handling default pools with limits | What should limit endpoints do with conflicting pools? | @pschorf @dposada | On GET , if a user has a limit defined with no pool and for the configured default pool, the limit configured to match the default pool should be returned. On DELETE , both limits should be removed. |
On GET , we need to choose one value to trump the other and the configured pool seems like a reasonable choice. On DELETE , since the two values mean the same thing, we can delete both. |
Job Count Quota | How should we encode different job count quotas by pool? | @pschorf @dposada | As part of the pool migration, count was migrated from a field on the share entity to a resource. | This was because the share and quota entities had an identity uniqueness for the username attribute, and it was decided that it was cleaner to maintain that constraint and add the pool to the resource entities (as opposed to adding pool to the quote or share and making username non-unique.) Since users can have different count quotas in different pools, we moved count to a resource. There was some precedent for non-mesos resource types (uri) already so it seemed like a reasonable compromise. |
Open Questions
- Do we need to support "stopping" and "restarting" a pool, e.g. a particular type of machine becomes unavailable and we want to stop accepting requests for that pool temporarily?