cobrapy icon indicating copy to clipboard operation
cobrapy copied to clipboard

Should the default processes value be the maximum number of CPUs on a machine?

Open taylo5jm opened this issue 6 years ago • 12 comments

In the current version of cobrapy, the default processes attribute of the Configuration object is often set to the output of multiprocessing.cpu_count function. This behavior results in setting the processes attribute to the maximum number of CPUs that are available on a machine.

I think this behavior can be convenient for some, but could be confusing to others. For example, I have seen a couple of questions recently that seem to have stemmed from differences in parallel processing between Windows and Unix systems.

It is possible to load and analyze a model without any knowledge of the configuration object, so the parallel behavior can be opaque to the end-user, which can make debugging more difficult for those that are new to Python and/or cobrapy.

From my perspective, using the maximum number of CPUs by default is a bit unconventional. I am thinking of mclapply and foreach in R. mclapply is a function that allows one to apply a function over a vector and the default number of cores is 2. foreach requires one to register a parallel backend with an explicit number of cores. If a user decides to use the %dopar% construct without registering a parallel backend, then only one core is used.

Thoughts? I am happy to make PR if the group is interested in changing the behavior.

taylo5jm avatar May 07 '19 15:05 taylo5jm

I don't have a clear opinion on the matter. In the issue you're referring to #846 the problem is that you cannot interactively run multiprocessing on Windows. This had not occurred if the global configuration default was 1.

Then again, almost all modern computers have more than 1 core. So we might be wasting potential if we choose a default of 1 and users do not use the processes argument of functions. Many applications these days use multiple cores, for example, Google Chrome is quite greedy.

So I see two options:

  1. Be conservative and set a global default of one. Solves all issues but also limits potential.
  2. Set a value great than one. We can now discuss what that value should be (maybe something like max(2, cpu_count() // 2) is a bit nicer). We can additionally catch the specific error that occurred in #846 and provide a more helpful message that alerts a user to the processes argument and multiprocessing in interactive environments.

Midnighter avatar May 07 '19 15:05 Midnighter

I understand the perspective that it could be a waste of potential to set the default processes value to 1. I know some people that have moved away from Chrome due to its greedy behavior. On the other hand, I am sure that I would be frustrated if Chrome was constrained to 1 CPU.

Thinking about this some more, I'm realizing that the choice of default will be somewhat arbitrary. Nonetheless, half of the maximum number of cores seems like a more reasonable number than the maximum. In the absence of knowledge about the parallel behavior, the user would (hopefully) not be bogging down the machine nor drastically underutilizing CPU.

I think your suggestion of handling the specific Windows error is elegant and solves the immediate problem that led me to open this issue. I am happy to take a stab at implementing the error handling.

taylo5jm avatar May 09 '19 20:05 taylo5jm

Yes, please feel free to go ahead with a PR :slightly_smiling_face:

Midnighter avatar May 10 '19 07:05 Midnighter

I would even go for cores - 1, since you usually want as much oomph for the computation as possible without freezing your system. Memory requirements are also something to consider. Multiprocessing pickles the entire environment and reloads into each new process. For larger models this can lead to significant memory usage.

cdiener avatar Jun 05 '19 15:06 cdiener

What's your experience here @cdiener? With parallel FVA for example, I see near linear in performance up to 4 processes (the physical number of cores on my machine) after that I see sub-linear gains. Due to Intel's hyperthreading the number reported by cpu_count() on my system is 8. So for me, the above //2 is actually the sweet spot without locking down my entire system.

Midnighter avatar Jun 06 '19 12:06 Midnighter

For server CPUs I see speed up even for hyperthreads. For desktop systems I still do but its definitely less after you reach the real number of CPUs. But that is a good point, even though I would not like to lock in the hyperthreads logic into cobrapy since that only applies to a particular set of CPUs.

cdiener avatar Jun 06 '19 16:06 cdiener

Apparently, psutil.cpu_count(logical=False) is the physical number of cores. I'm not sure it's worth adding psutil as a dependency just for this.

A separate point is that we should also care for the number of CPUs allotted to the program (len(os.sched_getaffinity(0))). Especially on clusters, this is likely different.

Midnighter avatar Jun 06 '19 16:06 Midnighter

Hi all, We have to ask what our users are:

  • [1] modelers and biologists with minor experience in parallelization (they just want to run the model and algorithms as fast as possible (without freezing there system ;)
  • [2] high-performance users which want to get out every last-bit of performance most likely on clusters and larger workstation setups.

For [1] we should have a setup which is at a sweet spot without freezing the system. I prefer here number_of_cpus-1. For [2] it doesn't matter, because they will set the maximum number of cpus according to available system resources. We could provide a warning from within the parallized algorithms which would just check if the cpu_count is the default value of number_of_cpus-1, e.g.

Using the default number of processes `number_of_cpus-1 for running `{parallelizable_algorithm_name}. 
You can change this in `configuration`.

matthiaskoenig avatar Jun 07 '19 07:06 matthiaskoenig

Just to chime in on this, there's a non-negligible overhead in invoking the multiprocessing pool; at least there appears to be on my recent macbook pro installation of cobrapy.

Configuration().processes gets set to 8 by default. For a fairly small model, I'm getting the following results:

%timeit flux_variability_analysis(model, reaction_list=model.exchanges)
18.4 s ± 263 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
5.71 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

pstjohn avatar Feb 18 '20 17:02 pstjohn

There definitely is a sizeable overhead involved in creating multiple processes. That is to be expected. Especially, if your number of exchange reactions there is rather small. 18 seconds seems extremely long, though. Do you have any suggestions? Should we create a warning? I typically don't like to try to be too smart, i.e., I wouldn't like to automatically decide the pool size based on the number of reactions passed.

Midnighter avatar Feb 18 '20 17:02 Midnighter

P.S.: What is the time required if you set processes=4 which I guess is the number of your physical cores rather than hyperthreads.

Midnighter avatar Feb 18 '20 17:02 Midnighter

In my experience the multiprocess overhead is usually due to the initial pickling of your entire environment and sending this to the new processes. So it is important to run those benchmarks in an empty Python environment. For reference I do see that on my Mac as well but with not that large of a discrepancy, and it becomes negligible for larger models.

In [1]: from cobra.test import create_test_model

In [2]: from cobra.flux_analysis import flux_variability_analysis

In [3]: model = create_test_model("textbook")

In [4]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=6)
254 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
23.6 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: model = create_test_model("ecoli")

In [7]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=6)
311 ms ± 45.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
767 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

cdiener avatar Feb 18 '20 21:02 cdiener