edalize icon indicating copy to clipboard operation
edalize copied to clipboard

How to limit CPU count for parallel builds

Open sjalloq opened this issue 3 years ago • 8 comments

Hi there,

I randomly noticed that when running fusesoc for Verilator that it uses all available CPUs for a parallel build. This seems to be due to the following lines in verilator.py in edalize.

        # Do parallel builds with <number of cpus> * 2 jobs.
        make_job_count = multiprocessing.cpu_count() * 2
        args = ['-j', str(make_job_count)]

Is there no way to limit the number of jobs? I'm running on a shared server and can't consume all the cores. I think this is a dangerous default to be using.

If I want to add an option to provide a user override, can you suggest the best way of doing this and I could have a go at providing a PR.

sjalloq avatar Jan 20 '21 09:01 sjalloq

Oh, I don't think we ever considered the case when someone is afraid of using too many cores. It was always the opposite. Yes, we need an override. I wonder if it's bad enough so that we should default to a single job, or keep it as is for backwards compatibility.

I would propose adding a new --jobs parameter to the verilator backend. Here's an example of how to add a parameter https://github.com/olofk/edalize/commit/00d5dd158f56559450838bb440a92d5273ffb47a Once added you should be able to run fusesoc run --target=yourtarget yourcore --jobs=<somenumber> and it should appear there if it uses the verilator backend. Now there is some confusion here. I'm not sure if the -j parameter is used when creating the verilator model or when the verilated source are compiled to an executable. Would be good to figure out. Maybe we want the same number of jobs for both

olofk avatar Feb 28 '21 21:02 olofk

@olof, I've just started running Verilator builds with FuseSoC again and have hit this issue. I was trying to remember where I'd asked the question but a quick search found this.

Where did we get to? I think the PR where you mentioned this ticket moved from using all threads to using all physical cores. Still not much use for a job running in a cluster environment. Can we add a new tool_option to limit this or perhaps use the environment variables set by various job schedulers? For example, I'm currently using SGE and it sets the NSLOTS variable to the number of cores your job is assigned. From memory the same is true when using LSF.

Something like this;

    def build_main(self):
        logger.info("Building simulation model")
        if not "mode" in self.tool_options:
            self.tool_options["mode"] = "cc"
        
        # If we are running on a cluster via a job scheduler, then don't
        # use more CPUs than the number the job has requested.
        def get_cpu_count():
            vars = ['NSLOTS']
            for var in vars:
                if var in os.environ:
                    return os.environ[var]
            return multiprocessing.cpu_count()
        
        # Do parallel builds with <number of cpus>
        make_job_count = get_cpu_count()
        args = ["-j", str(make_job_count)]

        if self.tool_options["mode"] == "lint-only":
            args.append("V" + self.toplevel + ".mk")
        self._run_tool("make", args, quiet=True)

shareefj avatar Mar 17 '22 16:03 shareefj

@imphil You commented on #247 so any thoughts on this solution?

shareefj avatar Mar 21 '22 10:03 shareefj

Maybe the simplest option is to not have edalize call make with any -j argument and document that the user can use the MAKEFLAGS environment variable to get parallel builds, e.g. export MAKEFLAGS=-j8; fusesoc ....

imphil avatar Mar 21 '22 10:03 imphil

Yep, that would work and is simple. @olofk ?

shareefj avatar Mar 21 '22 10:03 shareefj

Sounds perfectly fine to me

olofk avatar Mar 28 '22 11:03 olofk

I was just looking at opening a PR for this change. Where should this be documented? It's a big enough change that it should be flagged clearly to users right? I see a "my build takes 16X longer" ticket in our future...

shareefj avatar May 06 '22 08:05 shareefj

Good question. I do see the potential for grumpy users but there's currently no good place to store this kind of information so in this case I think it's better to get it over with and mention it in the release notes for the next version.

olofk avatar May 31 '22 20:05 olofk