genome-grist icon indicating copy to clipboard operation
genome-grist copied to clipboard

use shadow to handle tempdir for I/O bound downstream processes?

Open bluegenes opened this issue 3 years ago • 0 comments

Brought on by thinking about fastp (I/O bound) -- would be great to use base_tempdir to write temp files for any process like this.

We can use temp() as in the download rules, but since there's no need for post-processing after writing the initial file, we could alternatively use the shadow: directive to let snakemake handle writing and moving the file for us.

We would need to pass in base_tempdir as the --shadow-prefix arg when calling snakemake. To do this, we could move the base_tempdir code chunk (below) into __main__.py, and then pass base_tempdir in as --shadow-prefix and also as an item in the config.

chunk to move:

base_tempdir = None
try_temp_locations = config.get('tempdir', [])
for temp_loc in try_temp_locations:
    try:
        base_tempdir = tempfile.mkdtemp(dir=temp_loc)
    except FileNotFoundError:
        pass

if not base_tempdir:
    print(f"Could not create a temporary directory in any of {try_temp_locations}", file=sys.stderr)
    print("Please set 'tempdir' in the config.", file=sys.stderr)
    sys.exit(-1)

thoughts?

Note:

Shadow directories are stored one per rule execution in .snakemake/shadow/, and are cleared on successful execution. Consider running with the --cleanup-shadow argument every now and then to remove any remaining shadow directories from aborted jobs. The base shadow directory can be changed with the --shadow-prefix command line argument.

bluegenes avatar May 21 '21 14:05 bluegenes