recentrifuge icon indicating copy to clipboard operation
recentrifuge copied to clipboard

Add threads options, why and how

Open jfouret opened this issue 5 months ago • 2 comments

Hi,

You could easily add an option to control the number of threads.

A lots of people use an HPC cluster with job scheduler systems, (SLurm, Nextflow, AWS batch etc...) where one need to reserve a precise number of threads (e.g. 8) but ultimately the jobs runs on machines where the CPU count is higher.

Sometimes it can be very tricky to setup the number of cpus to reserve depending on the number of samples, mostly in a context where we integrate your tool in an automated workflow.

It appears to me that you could easily add this option, for example:

parser.add_argument(
  '--threads',
  type=int,
  default=os.cpu_count(),
  help='Number of threads to use (default: number of CPU cores)'
)

with:

            with mpctx.Pool(processes=min(min(os.cpu_count(), args.threads),
                                          len(input_files))) as pool:

You can Also combine the args sequential with threads where you switch to sequential when threads is equal to 1.

Bests,

jfouret avatar Sep 25 '24 08:09 jfouret