snakemake icon indicating copy to clipboard operation
snakemake copied to clipboard

Output files are not being removed when snakemake execution is stopped via ctrl+c

Open cgr71ii opened this issue 3 years ago • 4 comments

Snakemake version

7.14.0

Describe the bug

When I stop snakemake execution via ctrl+c, the output files are not removed, but they are when the execution is stopped because of a fail in the rules.

Logs

Logs from the provided minimal example below.

When it triggers the described bug:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job        count    min threads    max threads
-------  -------  -------------  -------------
sleep          6              1              1
targets        1              1              1
total          7              1              1

Select jobs to execute...

[Wed Sep 21 19:44:38 2022]
rule sleep:
    output: testSleepS1
    jobid: 1
    reason: Missing output files: testSleepS1
    wildcards: sample=S1
    resources: tmpdir=/tmp, qq=3

Output created!
^CTerminating processes on user request, this might take some time.
[Wed Sep 21 19:44:40 2022]
Error in rule sleep:
    jobid: 1
    output: testSleepS1
    shell:
        
          touch testSleepS1
          echo "Output created!"
          sleep 10s
          exit 1
          
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Complete log: .snakemake/log/2022-09-21T194438.069552.snakemake.log

What I miss is the following line:

Removing output files of failed job split since they might be corrupted:

If I run ls testSleep* | wc -l, then the output is 1 (of course, it should be 0). Then, rm testSleep* and I run the version where it fails because of the rule, which it removes correctly the output file since it might be corrupted:

Building DAG of jobs...                       
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:                
job        count    min threads    max threads
-------  -------  -------------  -------------
sleep          6              1              1
targets        1              1              1
total          7              1              1
                                                   
Select jobs to execute...
                                                   
[Wed Sep 21 19:44:23 2022]                                                                            
rule sleep:               
    output: testSleepS1
    jobid: 1
    reason: Missing output files: testSleepS1
    wildcards: sample=S1
    resources: tmpdir=/tmp, qq=3
                                                   
Output created!                 
[Wed Sep 21 19:44:33 2022]
Error in rule sleep:
    jobid: 1
    output: testSleepS1                                                                                                                                                                                      
    shell:
                                                                                                      
          touch testSleepS1                                                                                                                                                                                  
          echo "Output created!"                                                                                                                                                                             
          sleep 10s
          exit 1                                                                                      
          
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job sleep since they might be corrupted:
testSleepS1
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-09-21T194423.476724.snakemake.log

Here I see the line which I missed before: "Removing output files of failed job sleep since they might be corrupted". Now, ls testSleep* | wc -l output is 0.

Minimal example

rule targets:
        input:expand("testSleep{sample}", sample=["S1","S2","S3","S4","S5","S6"])

rule sleep:
        output:"testSleep{sample}"
        resources: qq=3
        threads: 2
        shell:
          """
          touch {output}
          echo "Output created!"
          sleep 10s
          exit 1
          """

Execution: snakemake -c1 --snakefile /path/to/provided/snakefile --rerun-incomplete

If you let the example run until it fails because of exit 1, you can check out that ls $PWD/testSleep* | wc -l is 1, but if you remove this file, execute again and do ctrl+c when you see the message "Output created!", then ls $PWD/testSleep* | wc -l is 0, what means that the file is not being removed when the execution is stopped via ctrl+c.

Additional context

Since output files are not removed, I have to removed them manually in order to rerun, what is tedious in my case since I have to check a lot of files and a lot of different rules.

cgr71ii avatar Sep 21 '22 17:09 cgr71ii

Since output files are not removed, I have to removed them manually in order to rerun

I think this is not entirely correct. If you hit CTRL-C in the middle of a job, when you re-run the pipeline you get:

Building DAG of jobs...
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
testSleepS2

so if you add --rerun-incomplete you don't need to remove those files manually. However, I agree that it would be nice snakemake removed those files before exiting due to CTRL-C.

dariober avatar Sep 23 '22 10:09 dariober

Ok, now I think I understand what's happened to me... This is a minimal example, but in my execution I'm moving .snakemake if I detect it exists because I'm running multiple instances of Snakemake (in the past this led to errors to me because of the folder .snakemake, so I decided to remove/move after every execution), so when I re-run if I hit ctrl+c, since files are not removed, they exist and are detected as finished because the folder .snakemake is created once again fresh (the information about the not-finished status is in the older .snakemake). In order to detect that these files were not finished correctly I understand I have to leave the .snakemake directory where it is, but this makes me wonder: is it "problematic" if I run multiple instances of Snakemake under the same .snakemake folder (I've run into situations where the instances failed to me because they were executed concurrently)? Can I change the path/name of the .snakemake with the CLI flags (I've checked out --shadow-prefix, but it only affects the shadow directory, not the whole .snakemake directory)?

Now I understand this is not a bug, but a feature (at least, I think so now). Files are not removed just in case that the content finished somehow or the uncompleted files are wanted for some reason, so Snakemake gives an error and the chance of ignore this error with --rerun-incomplete, which might be useful if, somehow, you know that your files were completed or you want to continue with the files uncompleted.

Now I think it is right to don't remove files when I stop the execution, but this leads me to wonder: why is different the behavior when I stop the execution with ctrl+c and when it's stopped because an error? In both situations we might want to keep those files.

Am I right about these thoughts?

An example about I'm talking about that led me to errors when running multiple instances of Snakemake:

for n in $(seq 1 100); do
  snakemake -c lang1=en lang2=fr shard=$n --snakefile Snakefile &> $n.log &
done

cgr71ii avatar Sep 23 '22 12:09 cgr71ii

In general, I wouldn't tweak the .snakemake directory and let snakemake handle it unless you are sure of what you are doing.

snakemake is right in preventing you from running multiple pipelines on the same output directory at the same time. Different instances will overwrite each other output causing a mess. I'd say that one of the reasons to use a workflow manager is to avoid such things to happen.

From top of my head, I would resolve your situation by either including the logic for n in $(seq 1 100) inside the Snakefile itself and running multiple jobs (snakemake -j 100 ...). Alternatively, assign to each instance a separate output directory, like:

for n in $(seq 1 100); do
  snakemake -d output_$n -c lang1=en ...
done

there may be other/better solutions depending on your case but I would say this is the correct behaviour of snakemake.

dariober avatar Sep 23 '22 12:09 dariober

I put a basic example, but of course the output files are different for each instance (output files are configured through different options provided with the flag -c and all output files are not overlapped among the different instances I execute). I don't totally agree with the idea of integrate this functionality in the pipeline. At least in my case, I'm using a pipeline where I give different input files and I get parallel text, and I want to run experiments where I want to quantify the total number of parallel text I get when I vary some configuration options. For these reasons, I think that the pipeline is making what it has to do, and since I want to quantify the total amount of text that I get modifying different options, I'd like to run multiple instances with different configuration options in order to parallelize the experiments and, of course, each instance is parallelized by Snakemake.

So, is not any way to modify the name of the .snakemakedirectory? I don't know, something like:

for n in $(seq 1 100); do
  dot_snakemake_directory=".snakemake_$n" # default in snakemake is `.snakemake`

  snakemake -c lang1=en lang2=fr input_files='["/path/to/WARC1", "/path/to/WARC2"]' \
    shard=$n --snakefile Snakefile --dot-snakemake-directory-name "$dot_snakemake_directory" &> $n.log &
done

And about the different behaviors when the execution is stopped via ctrl+c or because a rule failed, is there any known reason?

Thank you for the previous replies! :)

cgr71ii avatar Sep 23 '22 17:09 cgr71ii