snakemake icon indicating copy to clipboard operation
snakemake copied to clipboard

Incorrect shell command executed when defined within a loop.

Open ethan-shanahan opened this issue 1 year ago • 9 comments

Snakemake version

8.10.7 and 8.1.0

Describe the bug

When rules are defined within a loop, some rules execute the wrong command if the shell keyword is given an f-string containing variables assigned by the loop. The shell appears to interpret the f-string correctly, via inspection using the -p flag, however, a different command is actually executed.

Logs

n/a

Minimal example

rule all:
    input: "outputfile_a", "outputfile_b"

for l in ["a", "b"]:
    rule:
        name:   f"letter_{l}"
        input:  f"existingfile_{l}"
        output: f"outputfile_{l}"
        shell:  f"echo {l} > outputfile_{l}"

The first rule executed is letter_b which completes successfully, creating the file outputfile_b. Then rule letter_a is executed, but outputfile_a is not created and a MissingOutputException is raised.

Using the -p flag from the command line indicates that the correct command is run by both rules. However, if we replace the shell assignment with the following, we can see that in fact a different command is run.

        shell:  f"echo checking_{l} && echo {l} > outputfile_{l}"

It appears that the incorrect command is the same command that was run by rule letter_b. Below is a snippet of the output generated. The last two lines are printed due to the -p flag and the echo checking_{l} command respectively.

localrule letter_a:
    input: inputfile_a
    output: outputfile_a
    jobid: 1
    reason: Missing output files: outputfile_a
    resources: tmpdir=C:\Users\Ethan\AppData\Local\Temp

echo checking_a && echo a > outputfile_a
checking_b

Additional context

I am on Windows and using Python 3.12.3. I installed Snakemake from bioconda. This is my first "Issue", apologies if I've made a mistake!

ethan-shanahan avatar Apr 21 '24 20:04 ethan-shanahan

Never tried to make a for loop to define rules, but it is always nice to see new approaches. 😄 However, I don't think you need it since you can use {l} as a wildcards:

rule all:
    input: "outputfile_a", "outputfile_b"

rule:
    name:   "letter_{l}"
    input:  "existingfile_{l}"
    output: "outputfile_{l}"
    shell:  "echo {l} > outputfile_{l}"

fgvieira avatar Apr 23 '24 08:04 fgvieira

Thanks, that may help some people but not me. I only provided a minimal example of the bug of course. My actual application makes wildcards impractical if even impossible. Using a for loop to define rules is demonstrated in the documentation here.

ethan-shanahan avatar Apr 23 '24 11:04 ethan-shanahan

This is an interesting one. I have never dug into Snakemake's parser before (so take this with a grain of salt) but it seems the issue is there.

The parser takes the Snakefile and makes it into python.

This snakefile:

rule all:
    input: "bcftools/outputfile", "freebayes/outputfile"


for tool in ["bcftools", "freebayes"]:
    rule:
        name: f"call_variants_{tool}"
        output: f"{tool}/outputfile"
        shell: f"{tool} > {{output}}"

Gives this python:

@workflow.rule(name='all', lineno=1, snakefile='/Users/cade/dev/sandbox/snakemake_issues/issue_2824/Snakefile')

@workflow.input( "bcftools/outputfile" , "freebayes/outputfile"

)


@workflow.norun()
@workflow.run
def __rule_all(input, output, params, wildcards, threads, resources, log, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, sourcecache_path, runtime_sourcecache_path, __is_snakemake_rule_func=True):
        pass
for tool in [ "bcftools" , "freebayes" ] :
                                @workflow.rule(name=None, lineno=13, snakefile='/Users/cade/dev/sandbox/snakemake_issues/issue_2824/Snakefile')

                                @workflow.name( f"call_variants_{tool}"

                                )

                                @workflow.output( f"{tool}/outputfile"

                                )
                                @workflow.shellcmd ( f"{tool} > {{output}}" 
                                )
                                @workflow.run
                                def __rule_2(input, output, params, wildcards, threads, resources, log, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, sourcecache_path, runtime_sourcecache_path, __is_snakemake_rule_func=True):
                                        shell ( f"{tool} > {{output}}" , bench_record=bench_record, bench_iteration=bench_iteration
                                )

The functions at the bottom of each rule (decorated with @workflow.run) are called when it actually comes time to run the rule. Since the function name is static (__rule_2), then it'll be redefined each iteration of the for loop, leaving only the last definition.

The name keyword (added in #811) seems to be mostly superficial in that it's only used to change the rule name after the Snakefile is parsed. This was pretty confusing since the dryruns for the above workflow show the rules call_variants_bcftools/freebayes. However, both rules use the same run function.

Furthermore, the full shell command is not formatted until https://github.com/snakemake/snakemake/blob/6ec1e938210e0b2b43387bddd41dc16c70cdfb27/snakemake/shell.py#L135 which is why the proper output files were being created, but the shell command was wrong.

All of this to say, I'm not really sure the example in the docs works, and possibly never did? I'm still not super familiar with all of the inner workings of Snakemake (though I just learned alot lol) and it is getting late, so I very well could be wrong!

I'll think about how to go about fixing this, but it may take me awhile to get to this. Hopefully someone else can chime in!

Lastly, out of curiosity @ethan-shanahan what is your use case for this? Might be able to come up with a workaround using wildcards to get you something working until this is fixed.

cademirch avatar Apr 24 '24 06:04 cademirch

That's really fascinating @cademirch. It sounds like you're definitely on the right track! Is there a way to output the python code that the Snakemake parser generates for any Snakefile?

As for my use-case, its rather convoluted. Simply put, I'm building on top of Snakemake to generalise/automate the creation of Snakefiles. Thankfully, I have now found a workaround for myself. I first run a metaSnakefile.py script that contains all the logic and looping to resolve things like formatted strings, etc. This script creates the Snakefile with all the required rules written out explicitly. I'm still keen on this bug being fixed though! :)

ethan-shanahan avatar Apr 24 '24 07:04 ethan-shanahan

Of the top of my head, no. I just put print(join_compilation) after this line: https://github.com/snakemake/snakemake/blob/de12463604bc5f24f81907e33057ff42c31c7fc2/snakemake/parser.py#L1307

Glad you were able to figure something out!

cademirch avatar Apr 24 '24 20:04 cademirch

You can also just run snakemake with the option --print-compilation.

fgvieira avatar May 30 '24 12:05 fgvieira

I'm running into what I think is the same issue. Despite what is written in Snakemake's official documentation, this snakefile does not work:


from src.core.extract import get_extractors

extractors = get_extractors("src/config/extract")


rule all:
    input:
        [extractor["output"] for extractor in extractors],


for extractor in extractors:

    rule:
        name:
            f"extract_{extractor['module']}"
        input:
            extractor["input"],
            extractor["file"],
        output:
            extractor["output"],
        run:
            extractor["func"](extractor["input"], output)

No matter what file I try to build with this snakefile, it always runs the last extractor in the list.

@cademirch did you get anywhere with a fix for this? If not, I'm interested in picking this up where you left off.

cmhac avatar Jul 31 '24 14:07 cmhac

@cmhac I did not make any progress here unfortunately. Perhaps @mbhall88 can chime in as they wrote the docs for this in #2534.

cademirch avatar Sep 30 '24 17:09 cademirch

I have also run into this problem with f-strings in creating rules in a for loop. From memory, I think what happens is that the last (or first? Can't remember) element in the for loop gets used for all of the rules generated by the loop. This is something I would be happy to investigate a fix for by would definitely need @johanneskoester to point me to the correct part of the code where such an issue might arise.

mbhall88 avatar Sep 30 '24 22:09 mbhall88