snakemake-novice-bioinformatics
snakemake-novice-bioinformatics copied to clipboard
Reconsider Putting output before input
From @jdblischak
You have the learners write the output field before the input field. And your motivation is that it is natural to work backwards when writing a Snakefile, eg:
Rather than listing steps in order of execution, you are always working backwards from the final desired result. The order of operations is determined by applying the pattern matching rules to the filenames, not by the order of the rules in the Snakefile.
This logic of working backwards from the desired output is why we’re putting the output lines first in all our rules - to remind us that these are what Snakemake looks at first!
I am not a fan of this approach for two main reasons:
Pretty much any other Snakefile they encounter or tutorial they read will list input before output. As a concrete example, the official Snakemake tutorial. Having them write their Snakefiles different from everyone else adds unnecessary cognitive load While it's true that Snakemake works backwards just like Make does, and it's important for learners to understand this mental model, I don't think it is necessary for a Snakemake user to design their pipeline backwards. I always develop my Snakemake pipelines one rule at a time, in the forward order. While I have a vague sense of my final result, there are too many unknowns along the way. Inevitably I'll run into something frustrating like mismatched chromosomes between my sequencing files and the references files, and have to add a rule to fix this. In other words, I've never been able to follow your first step to "Define rules for all the processing steps". And even your lesson goes in the forward order, starting with trimming and counting before then adding rules for indexing and mapping So like I said above, I don't think you need to change your lesson. But I would recommended adding some boxes, eg:
box: We recommend listing output before input to remind yourself how Snakemake processes the rules, but note that this is our personal preference. Most other Snakefiles you see will list input first box: You can also build your pipeline one step at a time in the forward direction. Just make sure to always keep in mind that Snakemake processes the rules backwards