bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

Limiting action execution by ram usage

Open IamXander opened this issue 4 years ago • 6 comments

I am having an issue where we are running some truly massive actions (e.g. 15GB RAM+) and some extremely small actions (10MB RAM). When we get unlucky, all the massive build actions run on a single server where we start to see buildfarm or the massive actions get OOM-killed. We get around this by limiting the number of jobs such that all the massive actions fit on one machine. However, the drawback here is that when we want to run lots of small actions we end up letting a lot of resources go to waste. I am looking for a way to utilize all of our build resources.

Here are the ideas/thoughts I have already had Potential solution: Create a separate queue for the large actions Problem: This still lets a lot of resources go to waste because if we must always save space for the potential of running 15GB actions

What I would like Allow larges actions to participate in the same queue BUT only run them if we can reserve some symbolic resource in advance, e.g. only run a large action if we can get 3 cpu cores whereas the small actions one require 1 cpu core.

As far as I understand the CPU core logic only works for deciding which queue to enter rather than for allowing a large action to prevent other actions from running.

Very open to ideas and suggestions, thanks

Also in case anyone suggests that we get rid of the massive actions, I'm with you, but there are a lot of non-trivial complexities in doing that

IamXander avatar Feb 10 '21 16:02 IamXander

Idea 1:

The best way we can support this right now, is through your first idea:

  1. Tag actions to distinguish them as low/high RAM usage.
  2. Configure the operation queue to split actions by RAM usage.
  3. Configure different types of workers for taking on low/high RAM actions (possibly giving the higher RAM workers a lower execute_stage_width)

As you've said, this wastes build resources and may not be ideal.

Idea 2:

In regards to a better solution,The following comes to mind: buildfarm/worker/DequeueMatchEvaluator.java This entire module decides "yes" / "no" for whether the worker should keep the operation or put it back on the queue for another worker to evaluate. That would be a good place for a worker to evaluate its existing available resources (its RAM) and decide dynamically whether it should keep the operation or put it back. That would also allow these different RAM-requiring actions to continue sharing the same queue.

Idea 3:

I think an even better solution would be to make the workers themselves a little smarter about what they execute in parallel. Instead of just using the execute_stage_width, they would compare the expected ram usage of the action to the current saturation and decide whether or not to wait before executing.

That being said, hacking something into DequeueMatchEvaluator might be the easiest way to get what you need. I'll need to explore how the worker populates its execute stage-- that may not be too bad either. Buildfarm has an understanding of min-cores / max-cores. Its probably time we do something similar for min-ram / max-ram.

luxe avatar Feb 10 '21 17:02 luxe

similar request: https://github.com/bazelbuild/bazel-buildfarm/issues/512

luxe avatar Feb 10 '21 17:02 luxe

First off, thank you so much for the quick response!

Another thought just occurred to me which is to wrap each action in a wrapper script which tries to reserve resources from a global resource pool. We can just block heavy actions until there is enough RAM to run them. You could have actions starve so the wrapper would need to use a global queue where actions are greedy and hold onto resources.

e.g. action2 doesn't even starting trying to grab resources until action1 is running.

This feels super hacky but I think it would get me the result I am looking for

In regards to idea 2: The large actions would starve assuming a competing slew of small actions. I would like the actions to run in FCFS

Idea 3: This is a good idea. In my use case, it is really hard to know how much RAM an specific action would use, I can generalize out to a build type (e.g. I know opt actions tend to use a lot of RAM)

IamXander avatar Feb 10 '21 17:02 IamXander

Regarding the action resource pool, there are a couple of considerations: It could not be done in a wrapper, because the wrapper's invocation only occurs once an executor slot has been consumed. The existing slots mechanism is a pool of sorts, with the intention that there is one cpu core dedicated to the execution, and multi-"cpu-min" actions deplete from that pool asymmetrically.

A more sophisticated system would provide resource consumption for the input fetch stage as well (outstanding input reservations), preventing starvation in CFC expiry, as well as decoupling the Executor slot from core limitations - I can foresee many scenarios where cpu-binding of the executor slots would not make any sense, like in the use of GPUs (probably < CPU core count), and where the executor might not want to execute a command - say in the case of highly specific graphics rendering workloads

werkt avatar Feb 10 '21 20:02 werkt

We can do it in a wrapper it is just not elegant.

For example, buildfarm has an execution width of 100. Each wrapper script tries to get two tokens before it runs. This effectively makes the execution width 50 even thought buildfarm "thinks" it is running 100 actions at once. 50 of those 100 actions are blocked by lack of tokens. I think this is pretty gross but we get better resource utilization

So to directly respond, the pool is just bigger than it was before

IamXander avatar Feb 10 '21 20:02 IamXander

Thanks for clarifying. I wanted to understand your current work-around a little better. Why not directly set the execution stage width to 50?

Is it because: low ram actions take 1 token? high ram actions take 2 tokens?

Is that what you're doing to effectively keep a high execution width, but not over saturate?

luxe avatar Feb 13 '21 17:02 luxe