augur icon indicating copy to clipboard operation
augur copied to clipboard

io: Do not use multiple threads to write compressed files

Open huddlej opened this issue 4 years ago • 1 comments

Current Behavior

The io.open_file function uses xopen as its backend to transparently support compressed inputs and outputs. By default, xopen uses multiple threads in a separate process to write some compressed file formats. When processing large files like the full GISAID SARS-CoV-2 database and writing these out to a gzip file, it is easy for xopen's subprocesses (igzip) to use all available CPUs (e.g., on a laptop).

Expected behavior

Augur should always use a single CPU per command unless otherwise requested by the user through an argument like --nthreads.

Possible solution

Add a threads keyword argument to io.open_file with a default value of 1 and pass this argument to the xopen function call.

huddlej avatar Jun 01 '21 23:06 huddlej

Is it not beneficial to use all available CPUs for faster run times? Or are there complications with Snakemake's threads option?

victorlin avatar Dec 10 '24 01:12 victorlin