pertpy
pertpy copied to clipboard
Parallelize DE methods that support it
Description of feature
Some of the methods are embarrassingly parallel, e.g. statsmodels, wilcoxon test.
I suggest to use the following snippet from scirpy:
- using joblib it is more robust than multiprocessing
- joblib natively supports the
daskbackend, so you get out-of-machine support for free.
https://github.com/scverse/scirpy/blob/443e59e6245b917e87972f87df350ae4f429d011/src/scirpy/util/init.py#L567-L579
def _parallelize_with_joblib(delayed_objects, *, total=None, **kwargs):
"""Wrapper around joblib.Parallel that shows a progressbar if the backend supports it.
Progressbar solution from https://stackoverflow.com/a/76726101/2340703
"""
try:
return tqdm(Parallel(return_as="generator", **kwargs)(delayed_objects), total=total)
except ValueError:
logging.info(
"Backend doesn't support return_as='generator'. No progress bar will be shown. "
"Consider setting verbosity in joblib.parallel_config"
)
return Parallel(return_as="list", **kwargs)(delayed_objects)
https://github.com/scverse/scirpy/blob/443e59e6245b917e87972f87df350ae4f429d011/src/scirpy/ir_dist/metrics.py#L231-L233
block_results = _parallelize_with_joblib(
(joblib.delayed(self._compute_block)(*block) for block in blocks), total=len(blocks), n_jobs=self.n_jobs
)
Migrated from https://github.com/scverse/multi-condition-comparisions/issues/16