snakemake-wrappers
snakemake-wrappers copied to clipboard
Problem with the VEP ANNOTATE snakemake wrapper
Snakemake version
Snakemake
5.23.0
Wrapper
Issue
I'm trying to adapt my regular VEP code to use the snakemake wrapper instead but am running into an issue.
I want to make sure that a) the wrapper works for me and b) it produces the same results as the following:
vep \
-i {input.sample} \
--species "caenorhabditis_elegans" \
--format "vcf" \
--everything \
--offline \
--force_overwrite \
--fasta {input.ref} \
--gff {input.annot} \
--tab \
--variant_class \
--regulatory \
--show_ref_allele \
--numbers \
--symbol \
--protein \
-o {params.sample}
In order to use VEP
with wrappers there are 3 different rules.
I have got the following two working:
# VEP Download Plugins
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
wrapper:
"0.64.0/bio/vep/plugins"
# VEP Cache
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="caenorhabditis_elegans",
build="WBcel235",
release="100"
log:
"logs/vep/cache.log"
wrapper:
"0.64.0/bio/vep/cache"
The third rule is to actually run VEP for which I've written the following rule:
rule variant_annotation:
input:
calls= lambda wildcards: getVCFs(wildcards.sample),
cache="resources/vep/cache",
plugins="resources/vep/plugins",
output:
calls=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.annotated.vcf"),
stats=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.html")
params:
plugins=["LoFtool"],
extra="--everything"
message: """--- Annotating Variants."""
resources:
mem = 30000,
time = 120
threads: 4
wrapper:
"0.64.0/bio/vep/annotate"
When submitting the job this is the error I receive:
Building DAG of jobs...
Using shell: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 variant_annotation
1
[Tue Aug 25 11:23:04 2020]
Job 0: --- Annotating Variants.
Activating conda environment: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f
Failed to open VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf: unknown file type
Possible precedence issue with control flow operator at /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Traceback (most recent call last):
File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py", line 44, in <module>
"(bcftools view {snakemake.input.calls} | "
File "/home/moldach/bin/snakemake/lib/python3.8/site-packages/snakemake/shell.py", line 156, in __new__
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; (bcftools view VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf | vep --everything --fork 4 --format vcf --vcf --cache --cache_version 100 --species caenorhabditis_elegans --assembly WBcel235 --dir_cache resources/vep/cache --dir_plugins resources/vep/plugins --offline --plugin LoFtool --output_file STDOUT --stats_file ANNOTATION/VEP/470.html | bcftools view -Ov > ANNOTATION/VEP/470.annotated.vcf)' returned non-zero exit status 1.
[Tue Aug 25 11:25:02 2020]
Error in rule variant_annotation:
jobid: 0
output: ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
conda-env: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f
RuleException:
CalledProcessError in line 393 of /scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile:
Command 'source /home/moldach/miniconda3/bin/activate '/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f'; set -euo pipefail; python /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py' returned non-zero exit status 1.
File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile", line 393, in __rule_variant_annotation
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.8.0/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Removing output files of failed job variant_annotation since they might be corrupted:
ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
This issue was originally brought up on StackOverflow, then on the ensembl-vep
repo but seems it should be posted here for a definitive answer.
bcftools view
is giving the warning from the output of samtools mpileup
/varscan pileup2snp
:
def getDeduppedBamsIndex(sample):
return(list(os.path.join(aligns_dict[sample],"{0}.sorted.dedupped.bam.bai".format(sample,pair)) for pair in ['']))
rule mpilup:
input:
bam=lambda wildcards: getDeduppedBams(wildcards.sample),
reference_genome=os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
output:
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_{contig}.mpileup.gz"),
log:
os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"{sample}_{contig}_samtools_mpileup.log")
params:
extra=lambda wc: "-r {}".format(wc.contig)
resources:
mem = 1000,
time = 30
wrapper:
"0.65.0/bio/samtools/mpileup"
rule mpileup_to_vcf:
input:
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_{contig}.mpileup.gz"),
output:
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_{contig}.vcf")
message:
"Calling SNP with Varscan2"
threads:
2 # Keep threading value to one for unzipped mpileup input
# Set it to two for zipped mipileup files
log:
os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"varscan_{sample}_{contig}.log")
resources:
mem = 1000,
time = 30
wrapper:
"0.65.0/bio/varscan/mpileup2snp"
rule vcf_merge:
input:
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_I.vcf"),
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_II.vcf"),
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_III.vcf"),
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_IV.vcf"),
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_V.vcf"),
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_X.vcf"),
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_MtDNA.vcf")
output:
os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}.vcf")
log: os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"{sample}_vcf-merge.log")
resources:
mem = 1000,
time = 10
threads: 1
message: """--- Merge VarScan by Chromosome."""
shell: """
awk 'FNR==1 && NR!=1 {{ while (/^<header>/) getline; }} 1 {{print}} ' {input} > {output}
"""
calling_dir = os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"])
callings_locations = [calling_dir] * len_samples
callings_dict = dict(zip(sample_names, callings_locations))
def getVCFs(sample):
return(list(os.path.join(callings_dict[sample],"{0}.vcf".format(sample,pair)) for pair in ['']))
rule annotate_variants:
input:
calls=lambda wildcards: getVCFs(wildcards.sample),
cache="resources/vep/cache",
plugins="resources/vep/plugins",
output:
calls="{sample}.annotated.vcf",
stats="{sample}.html"
params:
# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
# Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
plugins=["LoFtool"],
extra="--everything" # optional: extra arguments
log:
"logs/vep/{sample}.log"
threads: 4
resources:
time=30,
mem=5000
wrapper:
"0.65.0/bio/vep/annotate"
If I run bcftools view
on the output I get the error (same for any of the contigs as well so its not the rule merge_vcf
) causing the issue:
$ bcftools view variant_calling/varscan/MTG324.vcf
Failed to read from variant_calling/varscan/MTG324.vcf: unknown file type