vep download cache: Add cache url
Is your feature request related to a problem? Please describe. The documentation of VEP makes it not very obvious how to use genomes that are not Homo sapiens, and it was hard to figure out why my attempts to get VEP to run on a plant species failed. Finally, I figured that one needs to specify a specific (not easy to find) FTP URL from where to download the vep data to the script so that the data can be found.
Hence, I suggest to add this capability to the vep download cache wrapper, and maybe document a bit better how one can select different genomes. Same for the fasta URL, if the user decides to download that data as well - which will however then trigger issue 365, but this is solved in my suggested code below as well.
Describe the solution you'd like Something like:
from pathlib import Path
from snakemake.shell import shell
# Get params. By default, we run only cache (--AUTO c), unlike the original wrapper,
# which also requested fasta (--AUTO cf), which would then mess up the check
# in the vep annotation wrapper that the subdirectory of the cache contains a single directory.
# See https://github.com/snakemake/snakemake-wrappers/issues/365
automode = snakemake.params.get("automode", "c")
extra = snakemake.params.get("extra", "")
# Extra optional cache and fasta url
cacheurl = snakemake.params.get("cacheurl", "")
if cacheurl:
cacheurl = "--CACHEURL \"{}\"".format(cacheurl)
fastaurl = snakemake.params.get("fastaurl", "")
if fastaurl:
fastaurl = "--FASTAURL \"{}\"".format(fastaurl)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Compared to the original wrapper, we add the two urls, and also use a newer version
# of vep install, which uses --CACHE_VERSION instead of --VERSION.
# This requires to change the environment to use vep 104.
shell(
"vep_install --AUTO {automode} "
"--SPECIES {snakemake.params.species} "
"--ASSEMBLY {snakemake.params.build} "
"--CACHE_VERSION {snakemake.params.release} "
"--CACHEDIR {snakemake.output} "
"--CONVERT "
"--NO_UPDATE "
"{cacheurl} {fastaurl} "
"{extra} {log}"
)
I am currently using this replacement of the wrapper myself, and it gets the job done. Note that this solves issue 365 as well, and that I updated vep to version 104, which would need to be changed in the environment.yaml. Currently, the cache and the annotate wrapper use different versions of vep (101 and 102), which is probably not ideal.
For anyone in the future trying to find the FTP URLs for these cache datasets, try http://uswest.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer and http://uswest.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache :-) There are links hidden under "Manually downloading caches" :-)
Not quite sure I understand the issue here. I've used all three VEP snakemake wrappers and never encountered the issues you mention. Can you send a minimal example with the latest wrapper versions?
This issue was marked as stale because it has been open for 6 months with no activity.
This issue was closed because it has been inactive for 1 month since being marked as stale. Feel free to re-open it if you have any further comments.
@fgvieira, just catching up with things, and saw that this issue was closed already... anyway, to answer your question, finally: As far as I recall, my issue was with species that are not in the default VEP/ensembl database paths, such as Arabidopsis thaliana (at the time of writing the issue - not sure if it has been added since). My problem was hence that I needed to specify a custom path for the download. Hope that clarifies it.
I'm not sure that this issue is actually resolved. But my tool describes the workaround that I mention above in order to specify custom URLs, so it's fine on my end :-)
If that solution works, would you mind making a PR?
@lczech can you check if PR #2928 fixes this issue?
@fgvieira, thanks, I think that should work. The wrapper script has changed a bit since, and the curl download has beed added, but of the vep_install then takes these curl-downloaded files, that should work. Thank you very much!