recentrifuge
recentrifuge copied to clipboard
Add threads options, why and how
Hi,
You could easily add an option to control the number of threads.
A lots of people use an HPC cluster with job scheduler systems, (SLurm, Nextflow, AWS batch etc...) where one need to reserve a precise number of threads (e.g. 8) but ultimately the jobs runs on machines where the CPU count is higher.
Sometimes it can be very tricky to setup the number of cpus to reserve depending on the number of samples, mostly in a context where we integrate your tool in an automated workflow.
It appears to me that you could easily add this option, for example:
parser.add_argument(
'--threads',
type=int,
default=os.cpu_count(),
help='Number of threads to use (default: number of CPU cores)'
)
with:
with mpctx.Pool(processes=min(min(os.cpu_count(), args.threads),
len(input_files))) as pool:
You can Also combine the args sequential with threads where you switch to sequential when threads is equal to 1.
Bests,