BiocParallel icon indicating copy to clipboard operation
BiocParallel copied to clipboard

Revise default param registration

Open mtmorgan opened this issue 4 years ago • 5 comments

MulticoreParam() is appealing for interactive use, but problematic in package use, as discussed eg https://github.com/drisso/zinbwave/issues/38#issuecomment-655587766. Update default strategy to use SnowParam() as the default.

mtmorgan avatar Jul 09 '20 13:07 mtmorgan

Just wanted to leave my two cents here as we had already some issues with this too (referring to: https://github.com/drisso/zinbwave/issues/38#issuecomment-656256405). As of my knowledge BiocParallel ignores SLURM, Snakemake or other job scheduler which provides number of cores to be used (which would be a nice feature request). Hence, if you do not specify this in your code on the user side explicitly you overcommit on cpu and memory by default, which usually ends in termination of your job (at least on SLURM where memory limits are enforced).

So instead of relying on N-2 as default I would rather suggest as default min(10, N-2) (or any other reasonable value for 10) as it will use all cores on a desktop/laptop, but will not drain right away a server/cluster with 100 cores or more. This should also be sufficient for most of the end-users.

c-mertes avatar Sep 01 '20 02:09 c-mertes

@c-mertes I think the BatchtoolsParam() is appropriate for use on SLURM; the vignette 'Introduction to BatchtoolsParam' discusses the registryargs parameter and use of templates for controlling use of cores.

mtmorgan avatar Sep 01 '20 04:09 mtmorgan

Sorry for not being too clear on this @mtmorgan. Batchtools is great if you want to parallelize within R across a cluster, but I was referring to the scenario where you use snakemake or another workflow manager that spawns jobs across clusters using e.g. SLURM and then call R scripts and parallelize within the job with MulticoreParams. Here one could capture the environment variables to restrict the cores. Maybe this scenario is just too specific.

c-mertes avatar Sep 01 '20 05:09 c-mertes

@c-mertes I think you are talking about an unrelated issue. This issue is about switching from parallelization using forking to non-forking parallelization, not changing the default number of cores.

DarwinAwardWinner avatar Sep 01 '20 11:09 DarwinAwardWinner

With regard to parallelization in RStudio, BiocParallel could borrow the logic used in future: https://github.com/HenrikBengtsson/future/blob/develop/R/supportsMulticore.R#L71-L91

DarwinAwardWinner avatar Sep 01 '20 11:09 DarwinAwardWinner