pertpy icon indicating copy to clipboard operation
pertpy copied to clipboard

Parallelize DE methods that support it

Open grst opened this issue 1 year ago • 0 comments

Description of feature

Some of the methods are embarrassingly parallel, e.g. statsmodels, wilcoxon test.

I suggest to use the following snippet from scirpy:

  • using joblib it is more robust than multiprocessing
  • joblib natively supports the dask backend, so you get out-of-machine support for free.

https://github.com/scverse/scirpy/blob/443e59e6245b917e87972f87df350ae4f429d011/src/scirpy/util/init.py#L567-L579

def _parallelize_with_joblib(delayed_objects, *, total=None, **kwargs):
    """Wrapper around joblib.Parallel that shows a progressbar if the backend supports it.

    Progressbar solution from https://stackoverflow.com/a/76726101/2340703
    """
    try:
        return tqdm(Parallel(return_as="generator", **kwargs)(delayed_objects), total=total)
    except ValueError:
        logging.info(
            "Backend doesn't support return_as='generator'. No progress bar will be shown. "
            "Consider setting verbosity in joblib.parallel_config"
        )
        return Parallel(return_as="list", **kwargs)(delayed_objects)

https://github.com/scverse/scirpy/blob/443e59e6245b917e87972f87df350ae4f429d011/src/scirpy/ir_dist/metrics.py#L231-L233

block_results = _parallelize_with_joblib(
         (joblib.delayed(self._compute_block)(*block) for block in blocks), total=len(blocks), n_jobs=self.n_jobs
)

Migrated from https://github.com/scverse/multi-condition-comparisions/issues/16

grst avatar May 27 '24 07:05 grst