Fenzo icon indicating copy to clipboard operation
Fenzo copied to clipboard

Resource ranges and "greedy" scheduling

Open zonybob opened this issue 7 years ago • 4 comments

UPFRONT DISCLAIMER: Nothing is broken - I am only asking if the Fenzo developer community would have any interest in pursuing or accepting a new feature.

Problem

I am working on a development effort where our team is adding new capabilities to a Mesos Framework that uses Fenzo. One of our requirements is "greedy" scheduling. Effectively, we work with a lot of scientific algorithms that can run sequentially or can run in parallel modes. The parallel modes run faster but need more resources, making them harder to schedule sometimes. Currently, these algorithms must be configured with high resource requirements, but sometimes they take forever to schedule. Alternatively, we can configure them with low resource requirements, and they run, but much more slowly.

Design space:

We are looking at introducing a data model on top of TaskRequest that takes "resource ranges" (min, max). We are then looking at two different design approaches:

Approach 1:

  1. Initialize task requests to their maximum requested resources (e.g. 16 cpus)
  2. After some configurable number of scheduling iterations have passed, if a task is still unassigned, back off or reduce the requested resources (e.g. 8 cpus).
  3. Continue until task is schedule or you reach minimum resource levels (e.g. 1.0 cpu)

Approach 2:

  1. Generate permutations on the TaskRequest ranges
  2. Let Fenzo generate SchedulingResults for the various permutations of TaskRequest resource ranges
  3. Run the SchedulingResults through some fitness evaluation that takes into account % of resources utilized, number of tasks scheduled, etc.
  4. Use the best SchedulingResult

We recognize that approach 2, although more optimal and faster to schedule, is not readily possible using TaskQueues and TaskSchedulingService. There would be a lot of changes to Fenzo needed to allow concurrent calls to scheduleOnce() on the same sets of resourceOffers and taskRequests to generate the permutations... (honestly not even sure if you'd use 1 taskQueue per permutation or try to handle it all in one).

Our proposed solution is to choose approach 2 ONLY IF the Fenzo team is interested in helping to support the feature, or if the Fenzo team would be willing to accept the additional complexity of a pull request with resource ranges and permutations.

If the Fenzo team is not interested in this "greedy" scheduling feature or doesn't believe it adds general-purpose value to the community, then our team is going to select approach 1.

Any insight you have on this matter is greatly appreciated! Whether it be a "yes" or "no" on contributing support...... a "yes" or "no" on accepting a pull request.... or even general guidance on something we've missed, whereby Fenzo is already capable of solving this problem elegantly.

Thank you!

zonybob avatar Mar 27 '17 19:03 zonybob

These are certainly possible with some changes in Fenzo.

I see why approach 2 may be more optimal for you. Although, one aspect you may be giving up from approach 1 is getting a "more optimal" assignment if you were to wait for sometime before lowering the resource requirements. For example, a task requiring 8 CPUs, but, willing to accept just 4 CPUs might get assigned 4 CPUs right away in approach 2. Where as, waiting for, say, a few seconds may have gotten the more optimal assignment of 8 CPUs. Without too many changes, there is, of course, no way to tell if waiting will in fact get you a better assignment.

Also, there could be some challenges from approach 2 in incorporating the multiple task assignments (from permutations) for the same task, where an assignment may have to abort previous assignment for its variant, etc. Your concern on its feasibility is very valid.

There may be another approach here.

Approach 3:

  1. Define a new task request type in Fenzo, something like VariableSizeQueuableTaskRequest. This task request has a list/array of scalar requests for entities like CPUs, memory, etc.
  2. Fenzo would then try to satisfy the request starting with the maximum of the requested quantity and iteratively down to the minimum, stopping with the max value it could allocate. The fitness value would be a function of where in the list it was able to assign
  3. The task assignment result would indicate the actual quantities assigned

This is in some ways a variation of your approach 2 but better supported in Fenzo itself instead of your framework having to create multiple "duplicate" tasks for the same task. It then removes the complexity of having to abort previous assignments, concurrent assignments of same offers, etc.

This would require changes in Fenzo, but, my preliminary thinking is that it can be done in a way such that other Fenzo users not using this feature would be unaffected.

This requires further thought. I haven't considered it in full detail yet. We may have to iterate on the details.

spodila avatar Mar 28 '17 01:03 spodila

I appreciate the quick feedback. We don't need to pick a solution for a few weeks so there's time to iterate. I mostly was interested up front in your level of interest and whether or not our approaches were way off! We look forward to more discussions.

zonybob avatar Mar 28 '17 03:03 zonybob

Just following up to see if you had any more insights on this scheduling approach. My current thought is that my "Approach 2" is too complex, but that I could implement "Approach 1" fairly easily... just to get the capability available in our scheduler. Then, maybe down the road a bit, if "Approach 3" were to become available in Fenzo, I would just migrate to that.

Any thoughts? Am I over-simplifying "Approach 1?" Do you think that "Approach 3" could happen one day in Fenzo? And one final, related question... I have some interest in poking around the Fenzo codebase. If I could get something working (with appropriate unit tests and documentation, of course) would you be willing to work with me on a pull request?

Thanks again!

zonybob avatar May 15 '17 15:05 zonybob

Your "Approach 1" sounds reasonable. This may work well for a lot of cases until the scale and complexity of the use case increases.

It is too early to say when/if Approach 3 will be available in Fenzo. This will need to be driven by folks with the most need for it.

And one final, related question... I have some interest in poking around the Fenzo codebase. If I could get something working (with appropriate unit tests and documentation, of course) would you be willing to work with me on a pull request?

Sure. Are you referring to "Approach 3" or a simpler one?

spodila avatar May 17 '17 19:05 spodila