varCA
varCA copied to clipboard
only install the necessary variant caller dependencies
Our pipeline requires that users install every variant caller at runtime, even if they don't actually use some of them. For example, DELLY is not used by the pipeline by default, but it is still installed by Snakemake when it is executed for the first time.
Is there a way to improve this behavior so that only the required dependencies are installed at runtime? Currently, the answer is no.
Why? Well, there are only two steps in the Snakemake pipeline that execute the variant callers in the ensemble: the prepare_caller rule and the run_caller rule. Both steps must be general enough that they would work for any variant caller. The inputs and outputs of those rules dynamically adapt to every caller based on a single wildcard. If we wanted to have the dependencies of the rule change too, we would need to change the env
rule based on the caller
wildcard. But snakemake
currently offers no way of doing this; you can't provide a lambda function to env
like you can for input
, output
, and params
.
I really only see one solution to this issue, then: I submit a pull request (or feature request) for snakemake
that adds the functionality we desire. I can't really think of anything else short of some sort of major refactor?
Ok, well apparently this is now possible in Snakemake v6, allowing us to do option 2 in #30! See the Rule Inheritance section of the Snakemake documentation, specifically this part:
use rule a as b with:
output:
"test2.out"
Presumably, we could wrap this in a for-loop and use it to just change the conda
directive. So we could do something like this:
for caller in callers:
use rule run_caller as "run_"+caller with:
conda: f"envs/{caller}.yml"
Barring any unforeseen challenges, I should be able to resolve this issue in a few weeks. It might still require quite a bit of code restructuring and testing.