sarek
sarek copied to clipboard
Can't download VEP cache from behind proxy [BUG]
Check Documentation
I have checked the following places for your error:
Description of the bug
There is a proxy at the HPC I am at, which makes it hard to connect to external sources, for example when trying to download the cache for SnpEff and VEP, using the ./download_cache
.nf script.
I successfully got SnpEff to work by exporting some variables using the .conf file:
# Add variable to my local execution directory nextflow.conf (i.e., not in the SAREK root)
singularity.envWhitelist = 'SINGULARITYENV_JAVA_TOOL_OPTIONS' > nextflow.conf
# Then export the variable
export SINGULARITYENV_JAVA_TOOL_OPTIONS="-Dhttp.proxyHost=xxxx -Dhttp.proxyPort=xxxx -Dhttps.proxyHost=xxxx -Dhttps.proxyPort=xxxx"
But for VEP, it appears not possible to forward the right flags: https://github.com/bcbio/bcbio-nextgen/issues/818
The solution
I have a change to suggest, both for me to not have to fork and modify the process of ./download_cache, and also potentially make life easier for others with proxy problems.
Why not add an option to the script for pointing to a local directory to search instead of the remote cache at ensembl. The process would then look something like this:
vep_install \
-a cf \
-c . \
-s ${species} \
-y ${genome} \
-u ${local_vep_cache_dir} \
--CACHE_VERSION ${vep_cache_version} \
--CONVERT \
--NO_HTSLIB --NO_TEST --NO_BIOPERL --NO_UPDATE
Then running ./download_cache would look like this:
# Download the cache for VEP GRCh38(VEP version 104)
mkdir -p local_vep_cache_dir_tmp
wget -P local_vep_cache_dir_tmp ftp://ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/homo_sapiens_vep_104_GRCh38.tar.gz
nextflow \
run repos/sarek/download_cache.nf \
-with-singularity simgs/sarek.2.7.1.sif \
--vep_cache annotation_cache/VEPeff_cache \
--species homo_sapiens \
--vep_cache_version 104 \
--genome GRCh38 \
--local_vep_cache_dir local_vep_cache_dir_tmp
Let me know if you think this is a good solution. In the meantime, I will prepare for a PR.
Nextflow Installation
- Version:20.10.0
Container engine
- Image tag: nfcore/sarek:2.7 sha256:09da1f431aebe8b61da6b989ed2adf17edd03492408d403f87d26b543bd0a365
From vep_install -h
Usage:
perl INSTALL.pl [arguments]
Options
=======
-h | --help Display this message and quit
-d | --DESTDIR Set destination directory for API install (default = './')
--CACHE_VERSION Set data (cache, FASTA) version to install if different from --VERSION (default = 99)
-c | --CACHEDIR Set destination directory for cache files (default = '/home/jesgaaopen/.vep/')
-a | --AUTO Run installer without user prompts. Use "a" (API + Faidx/htslib),
"l" (Faidx/htslib only), "c" (cache), "f" (FASTA), "p" (plugins) to specify
parts to install e.g. -a ac for API and cache
-n | --NO_UPDATE Do not check for updates to ensembl-vep
-s | --SPECIES Comma-separated list of species to install when using --AUTO
-y | --ASSEMBLY Assembly name to use if more than one during --AUTO
-g | --PLUGINS Comma-separated list of plugins to install when using --AUTO
-r | --PLUGINSDIR Set destination directory for VEP plugins files (default = '/home/jesgaaopen/.vep/Plugins/')
-q | --QUIET Don't write any status output when using --AUTO
-p | --PREFER_BIN Use this if the installer fails with out of memory errors
-l | --NO_HTSLIB Don't attempt to install Faidx/htslib
--NO_BIOPERL Don't install BioPerl
-t | --CONVERT Convert downloaded caches to use tabix for retrieving
co-located variants (requires tabix)
-u | --CACHEURL Override default cache URL; this may be a local directory or
a remote (e.g. FTP) address.
-f | --FASTAURL Override default FASTA URL; this may be a local directory or
a remote (e.g. FTP) address. The FASTA URL/directory must have
gzipped FASTA files under the following structure:
[species]/[dna]/
Currently not providing a download script for cache and other files for annotation