snakemake icon indicating copy to clipboard operation
snakemake copied to clipboard

[Draft] Left-over `temp()` files

Open DrYak opened this issue 3 years ago • 4 comments

Snakemake version

Bug appeared in the 7.x series. The bugfix for temp files introduced in version 7.3 (and further fixed in 7.3.5) hasn't fixed all our issues. Bug still present as of 7.8.2

Describe the bug

Note: this is a draft submission, our team needs to analyze the issue better in order to make a simpler proof-of-concept example. This current issue is based on some real-world run on large datasets, we need to first trim down to a usable minimal example.

Not all files tagged as temp() are cleaned-up. What we have observed while rummaging through logs and scratch directories:

It seems that in simple situation with one temp file passed between a producer rule and a consumer rule, everything works fine, and the file is deleted upon completion of the consumer rule (ie. when exactly 2 rules are involved):

  • rule Atemp()rule B

But we observe left-over temp files when:

  • multiple rules (>2) sharing the same files seem to throw it of:
    • rule A, rule Btemp() → rule C (i.e.: two alternative rule could be generating a given file, selected through configuration)
    • rule Atemp() → rule B, rule C (i.e.: multiple down-stream consumers of the temp file)
  • single rules (<2) ditto:
    • rule Atemp() (this is used in the absence of proper handling of node-local temporary space by snakemake #1474)

Less certain if that hypothesis is the mechanism:

  • When a rule is affected, its other output could be affected, despite being straight forward.
    • rule A, rule Btemp() → rule C
      • n rules >2, affected
    • rule Ctemp() → rule D
      • n rules = 2, but coming from affected rule C i.e.: the whole "left-over temp file" issue is triggered at the level of a whole rule, as soon as any of its input or output aren't straight forward.

Also, I have no idea whether this is related the current issue or completely independent:

  • Currently snakemake reports a lower total number of jobs in the initial plan (e.g.: total 664) than in the progression counter (e.g.: finished with ( 664 of 1163 steps (57%) done)
  • This peculiar run was executed using the --until command-line option Note though that none of the temporary files are used beyond that point, so they would have been all safe to be deleteted, but only the straightforward (rule Atemp()rule B) were.
  • Question: Is snakemake internally creating some "virtual jobs" to handle clean-up of more complex temp() files?
  • Also out pipeline runs on a HPC (using LSF as a job submission system), and uses groups to bundle multiple rules on a single job.

Logs

Note: as mentioned this comes from a large real-world use, we need to trim it down.

Job stats:
job                    count    min threads    max threads
-------------------  -------  -------------  -------------
dehuman                  166             16             16
dh_filter                166             16             16
dh_hostalign             166             16             16
dh_redo_alignreject      166              8              8
total                    664              8             16

Select jobs to execute...

…

4 of 1163 steps (0.3%) done

…

[Sun Jun 19 13:19:15 2022]
Finished job 2399.
[Sun Jun 19 13:19:15 2022]
Finished job 2398.
[Sun Jun 19 13:19:15 2022]
Finished job 2397.
[Sun Jun 19 13:19:15 2022]
Finished job 2396.
664 of 1163 steps (57%) done
Removing temporary output /cluster/scratch/bs-pangolin/pangolin/temp/samples/B8_10_2020_12_16_NA_NA/20201223_HWKGTDRXX/raw_uploads/filtered_1.fastq.gz.
Removing temporary output /cluster/scratch/bs-pangolin/pangolin/temp/samples/B8_10_2020_12_16_NA_NA/20201223_HWKGTDRXX/raw_uploads/filtered_2.fastq.gz.
Complete log: .snakemake/log/2022-06-19T125746.832133.snakemake.log

…/raw_uploads/filtered_2.fastq.gz. is a straightforward case (rule Atemp()rule B).

After running:

/cluster/scratch/bs-pangolin/pangolin/temp/samples/12_2020_11_5_A/20201215_JGDDY/alignments/reject_R2.fastq.gz
/cluster/scratch/bs-pangolin/pangolin/temp/samples/12_2020_11_5_A/20201215_JGDDY/alignments/host_aln.sam
/cluster/scratch/bs-pangolin/pangolin/temp/samples/12_2020_11_5_A/20201215_JGDDY/alignments/reject_R1.fastq.gz
/cluster/scratch/bs-pangolin/pangolin/temp/samples/12_2020_11_5_A/20201215_JGDDY/alignments/dehuman.filter
/cluster/scratch/bs-pangolin/pangolin/temp/samples/12_2020_11_5_A/20201215_JGDDY/alignments/dh_aln.sam
/cluster/scratch/bs-pangolin/pangolin/temp/samples/12_2020_11_5_A/20201215_JGDDY/raw_uploads/dehuman.sam
…

This remaining files have the other organizations mentioned above.

Minimal example

Will come soon...

Additional context

These issues have been affecting us while we're running V-pipe. In several point of the pipeline, such as the rules in charge of depleting human-mapping alignement-reject from the raw read, and recompress the raws as cram files, before uploading those on SRA.

DrYak avatar Jun 21 '22 10:06 DrYak

@uweschmitt could you help with the minimal example?

DrYak avatar Jun 21 '22 10:06 DrYak

@uweschmitt could you help with the minimal example?

Will have a look tomorrow.

uweschmitt avatar Jun 21 '22 15:06 uweschmitt

rule all:
    input:
        "files/done.txt",

rule create_data:
    output:
        used_output = temporary("files/used_output"),
        unused_output = temporary("files/unused_output"),  
    group: "a"
    shell:
        """
        echo rule a > {output.used_output}
        echo rule a > {output.unused_output}
        """

rule consume_data:
    input:
        froma = "files/used_output"
    group: "a"
    output:
        done=("files/done.txt")
    shell:
        """
        touch {output.done}
        """

Snakemake does not cleanup the files/unused_output file when run in cluster mode. The group assignments in the scrpt are required to trigger the bug:

$ snakemake --cluster "bsub" -j 1
...
$ ls files
done.txt  unused_output

uweschmitt avatar Jun 27 '22 12:06 uweschmitt

We will delete this issue soon which is superseded by https://github.com/snakemake/snakemake/issues/1754

uweschmitt avatar Jun 30 '22 15:06 uweschmitt