funannotate
funannotate copied to clipboard
Installation: Conda, Docker, etc
I know there are a lot of dependencies with funannotate (it isn't ideal). I spent quite some time getting funannotate working on conda
. However, as most of you know that have worked with conda
, it is also far from perfect. I can't possibly test everybody's system, my two test environments are centOS
and macOS
. In my experience, the key to keeping conda working is to never install anything in the base
environment, this is especially true in a shared system as inevitably you end up with permissions issues with multiple users. If you are trying to install funannotate
via conda, i.e. conda create -n funannotate funannotate
and you are getting errors that it is unable to solve, then you most likely have packages installed in your base
environment that are clashing with the dependencies.
I also know there are several people interested in Docker/Singularity containers #183 #366 #257 #379 . I've been told that conda
doesn't work in these environments and fails to find a solution. I finally had some time to look around and see what worked/didn't work for me.
On my macOS
system, I'm able to build a Docker image for funannotate v1.7.4
using python 2.7 as follows. Note that apparently on Debian
systems the forge
program in the Bioconda snap
recipe isn't working (#387 thanks to @nhartwic for the fix), below you'll find a fix for that in this recipe. For me, bioconda snap works on both centOS and macOS.
FROM continuumio/miniconda3:latest
RUN conda config --add channels defaults && conda config --add channels bioconda && \
conda config --add channels conda-forge && conda update -n base -c defaults conda && \
apt-get update && apt-get -y install gcc
RUN conda create -n funannotate --yes "funannotate=1.7.4" && conda clean -afy && \
mkdir -p /home/funannotate_db && echo "source activate funannotate" > ~/.bashrc
ENV FUNANNOTATE_DB=/home/funannotate_db
SHELL ["conda", "run", "-n", "funannotate", "/bin/bash", "-c"]
#bioconda snap is partially broken on some systems (debian apparently), forge is problem
WORKDIR /opt
RUN git clone https://github.com/KorfLab/SNAP.git && cd SNAP && make && \
cp /opt/SNAP/forge /opt/conda/envs/funannotate/bin/forge
RUN funannotate setup -i all
#for some reason USER needs to be set for seqclean to work in docker
ENV USER='me'
WORKDIR /home
I'm also working on a python3 port of the code which is in the python3
branch. I'm still doing some testing on this branch to make sure everything is working. It would be helpful for others to test the code as well, as I know people use the scripts in slightly different ways. This can be tested in a docker environment like this (note I found a unicode error that needs to be fixed in this py3 port, but you get the idea):
FROM continuumio/miniconda3:latest
RUN conda config --add channels defaults && conda config --add channels bioconda &&\
conda config --add channels conda-forge && conda update -n base -c defaults --yes \
conda && apt-get update && apt-get -y install gcc make
RUN conda create -n funannotate --yes python=3.7 biopython goatools matplotlib natsort \
psutil requests scikit-learn scipy seaborn "blast=2.2.31" tantan bedtools hmmer \
exonerate "diamond>0.9,<=0.9.24" tbl2asn ucsc-pslcdnafilter "pasa>=2.4.1" \
trimmomatic raxml trimal "mafft>=7" iqtree "kallisto>=0.46,<0.46.2" evidencemodeler \
codingquarry stringtie snap glimmerhmm trnascan-se hisat2 "proteinortho>=6.0.9" \
"salmon>=0.9" perl "perl-bioperl>1.7" perl-dbd-mysql perl-clone perl-hash-merge \
perl-soap-lite perl-json perl-logger-simple perl-scalar-util-numeric minimap2 \
perl-text-soundex perl-parallel-forkmanager "r-base>=3.4.1" bamtools numpy pandas \
"augustus>3.3" "trinity>=2.8.5=h8b12597_5" pip && conda clean -afy && \
mkdir -p /home/funannotate_db && echo "source activate funannotate" > ~/.bashrc
ENV FUNANNOTATE_DB=/home/funannotate_db
SHELL ["conda", "run", "-n", "funannotate", "/bin/bash", "-c"]
#bioconda snap is partially broken on some systems (debian apparently), forge is problem
WORKDIR /opt
RUN git clone https://github.com/KorfLab/SNAP.git && cd SNAP && make && \
cp /opt/SNAP/forge /opt/conda/bin/forge
RUN python -m pip install git+https://github.com/nextgenusfs/funannotate.git@python3
RUN funannotate setup -i all -w
#for some reason USER needs to be set for seqclean to work in docker
ENV USER='me'
WORKDIR /home
Hi there,
Just double checking: Is there something wrong with using the Docker container automatically pushed by Bioconda to Quay.io? https://bioconda.github.io/recipes/funannotate/README.html
For newer versions of Singularity, you can directly pull from quay.io as well using the docker://
URL format, e.g.:
singularity pull docker://quay.io/biocontainers/funannotate:1.7.4--py27_0
I just tested it, and the singularity image seemed to pull / be executable with singularity shell
without any obvious errors.
Hi everyone,
I also spent some time on creating a Docker/Singularity container for funannotate. It is available here: https://github.com/reslp/funannotate-docker
. I have been using it quite a bit lately and it works pretty well (also in a cluster environment using Snakemake and job scheduling). Although I have not tested all features (eg. training with RNA Seq evidence).
It does not use the preferred way to install funannotate though (which is conda), because when I started working on it there were some dependency issues with ete3 (which are now fixed). Everything in the container is thus installed manually. However I may transition to the conda installation at some point, probably when funannotate 1.8 is released.
I am posting this here in the hope that this could be useful to some of you.
all the best, Philipp
Hello @nextgenusfs,
Thank you for these comments on the installation! What makes it a bit harder and confusing for new users (like myself) is the fact that the instructions on readthedocs
and in the readme of this GitHub differ.
Also, I see the discussion above about moving to python3. Does that mean that we should create the conda env with python=2.7
?
Thank you in advance.
@apredeus this is >1 year old. You should not use python2.7 at this point. Sorry that the docs are not always up to date -- this entire project is all outside of work and I volunteer my time.
Thank you for commenting - didn't notice that at all, I was just searching for comments that can bring some clarity to the instructions. I certainly didn't mean the comment in any negative way - big thanks for keeping the project alive in your spare time!
PS So just to clarify: all the readthedocs instructions can be ignored, and github readme is the way to go, right?
As of 3 minutes ago installation instructions are the same in both places.
"IndexError: list index out of range" in funannotate update Parsing GenBank files...comparing annotation. I am trying to use funannotate to annotate two-spotted mite (genome size 90mb). I use the docker image. I have the illumian RNA-seq and pacbio ISO-seq for the evidence. The process clean, sort, mask, train, predict ran smoothly. When I ran update, I got the error message. I tried use -- --pasa_db mysql, the error message is the same. I would like to have some help. Here is the command I use: funannotate-docker update -i fun --pasa_db mysql --cpus 32 the error message: (base) yizhouc@yct7920:~/raid_mnt/p_ausgem_urticae$ bin/2021-09-26_funannote_Susceptible_update.sh "
[Sep 26 09:28 AM]: OS: Debian GNU/Linux 10, 64 cores, ~ 791 GB RAM. Python: 3.8.10
[Sep 26 09:28 AM]: Running 1.8.9
[Sep 26 09:28 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Sep 26 09:28 AM]: Found relevant files in fun/training, will re-use them:
Forward reads: fun/training/left.fq.gz
Reverse reads: fun/training/right.fq.gz
Forward Q-trimmed reads: fun/training/trimmomatic/trimmed_left.fastq.gz
Reverse Q-trimmed reads: fun/training/trimmomatic/trimmed_right.fastq.gz
Forward normalized reads: fun/training/normalize/left.norm.fq
Reverse normalized reads: fun/training/normalize/right.norm.fq
Trinity results: fun/training/funannotate_train.trinity-GG.fasta
Long-read results: fun/training/funannotate_long-reads.fasta
PASA config file: fun/training/pasa/alignAssembly.txt
BAM alignments: fun/training/funannotate_train.coordSorted.bam
StringTie GTF: fun/training/funannotate_train.stringtie.gtf
[Sep 26 09:28 AM]: Reannotating Tetranychus urticae, NCBI accession: None
[Sep 26 09:28 AM]: Previous annotation consists of: 17,462 protein coding gene models and 108 non-coding gene models
[Sep 26 09:28 AM]: Existing BAM alignments found: fun/update_misc/trinity.alignments.bam, fun/update_misc/transcript.alignments.bam
[Sep 26 09:28 AM]: Skipping PASA, found existing output: fun/update_misc/pasa_final.gff3
[Sep 26 09:28 AM]: Existing Kallisto output found: fun/update_misc/kallisto.tsv
[Sep 26 09:28 AM]: Parsing Kallisto results. Keeping alt-splicing transcripts if expressed at least 10.0% of highest transcript per locus.
[Sep 26 09:28 AM]: Wrote 18,358 transcripts derived from 17,390 protein coding loci.
[Sep 26 09:28 AM]: Validating gene models (renaming, checking translations, filtering, etc)
[Sep 26 09:28 AM]: Writing 17,486 loci to TBL format: dropped 0 overlapping, 0 too short, and 0 frameshift gene models
[Sep 26 09:28 AM]: Converting to Genbank format
[Sep 26 09:31 AM]: Collecting final annotation files
[Sep 26 09:31 AM]: Parsing GenBank files...comparing annotation
Traceback (most recent call last):
File "/venv/bin/funannotate", line 8, in
Please open a new issue as this is unrelated to docker. Make sure to post log files and commands you ran.
Hi! I am trying to install funannotate with docker using the instructions in the README.md file. I am getting it to run (nice!) but am not able to get some of the dependencies working, such as signalp and genemark. I found some instructions related to docker here but it is for an older version of funannotate and as far as I can tell, that dockerfile doesn't exist anymore. Can I do something similar with the more recent nextgenusfs/funannotate/Dockerfile? I am using a linux server.
Thanks!
(base) ccallen@debary-vm1:/data/ccallen$ docker run --rm -it nextgenusfs/funannotate funannotate check --show-versions
Checking dependencies for 1.8.10
You are running Python v 3.8.12. Now checking python packages... biopython: 1.77 goatools: 1.1.12 matplotlib: 3.5.1 natsort: 8.0.2 numpy: 1.22.0 pandas: 1.3.5 psutil: 5.9.0 requests: 2.27.1 scikit-learn: 1.0.2 scipy: 1.5.3 seaborn: 0.11.2 All 11 python packages installed
You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000024 threads: 2.15 threads::shared: 1.56 ERROR: Bio::Perl not installed, install with cpanm Bio::Perl
Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.13 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.0 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed
genemark,eggnog, and signalp need to be installed outside of conda they have separate licenses that do not let them be packaged within conda. https://github.com/biocore/conda-recipes/issues/17 The conda instructions here guide you to info on installing genemark https://funannotate.readthedocs.io/en/latest/conda.html
it looks like eggnog mapper also needs to be installed too. https://anaconda.org/bioconda/eggnog-mapper
Thanks @hyphaltip! I have all of these working using conda, or can at least run them externally and pass the files to predict or annotate. I am trying to get the latest build installed with docker and haven’t worked out yet how to incorporate these tools into the container. I am new to docker and will keep working at it thanks!
@ccgallen you'll need to build your own docker image to do this. Note I wouldn't actually recommend using docker if you already have something working with conda as it gives you a lot more flexibility, docker is great but has some limitations -- in the context of funannotate it makes it difficult to use because of the database sizes and incorporating other tools (eggnog/interproscan/etc).
The docs you linked to above are several years old. Now the docker image is built when new versions are tagged as well as on every commit by GitHub Actions, so to incorporate the tools that have separate licenses, you'll need to setup a new docker file that has the build instructions, you can use the funannotate one as a base image. Here is a quick example -- note this is untested code -- but the idea is you will need to copy the install packages into the docker container, and then install it in the container during the build, it then will also need to be in the PATH. You can look at the Dockerfiles in this repo to see how the base image is made, its made in a two step process initially to save space, but effectively /venv/bin/
is in the path so you can softlink tools to that location and funannotate should be able to find them.
So the sed statements below are part of "normal" signalp 4.1 installation -- any thing you would need to do to install a particular tool on your existing system needs to be in the dockerfile.
FROM nextgenusfs/funannotate
WORKDIR /opt
COPY signalp-4.1f.Linux.tar.gz /opt
RUN tar -zxvf signalp-4.1f.Linux.tar.gz && \
sed -i 's,/usr/cbs/bio/src/signalp-4.1,/opt/signalp-4.1,g' signalp-4.1/signalp && \
sed -i 's,#!/usr/bin/perl,#!/usr/bin/env perl,g' signalp-4.1/signalp && \
ln -s /opt/signalp-4.1/signalp /venv/bin/signalp
thanks so much @nextgenusfs!
Hi. I am trying to add procps ("ps command") to the latest Docker image. When trying to build the latest Docker file (funannotate-1.8.11) without changing anything, I got the following error:
docker build -t funannotate-1.8.11 .
Step 6/13 : RUN conda-pack -n funannotate -o /tmp/env.tar && mkdir /venv && cd /venv && tar xf /tmp/env.tar && rm /tmp/env.tar
---> Running in 21c43c8747ce
CondaPackError:
Files managed by conda were found to have been deleted/overwritten in the
following packages:
- xlrd 2.0.1:
lib/python3.8/site-packages/xlrd-2.0.1.dist-info/INSTALLER
lib/python3.8/site-packages/xlrd-2.0.1.dist-info/LICENSE
lib/python3.8/site-packages/xlrd-2.0.1.dist-info/METADATA
+ 5 others
This is usually due to `pip` uninstalling or clobbering conda managed files,
resulting in an inconsistent environment. Please check your environment for
conda/pip conflicts using `conda list`, and fix the environment by ensuring
only one version of each package is installed (conda preferred).
Do you have any suggestions on how to fix this?
@aramos-solena it was due to a Conda-pack issue -- the current docker image should have procps installed. You can see the Dockerfile and what I changed to make it work.
@aramos-solena it was due to a Conda-pack issue -- the current docker image should have procps installed. You can see the Dockerfile and what I changed to make it work.
Thanks so much @nextgenusfs , this fixed the issue.
Hi,
I am fairly new to this forum, so pardon me if I'm not supposed to comment here. I encountered a similar issue as described previously #702 as I was checking for SignalP function when checking dependencies. This is based on the latest docker image.
Logfile
Checking dependencies for 1.8.14
You are running Python v 3.8.12. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.7.0 natsort: 8.2.0 numpy: 1.22.4 pandas: 1.5.3 psutil: 5.9.4 requests: 2.28.2 scikit-learn: 1.1.1 scipy: 1.5.3 seaborn: 0.12.2 All 11 python packages installed
You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000029 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed
Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config $GENEMARK_PATH=/home/u4485090/funannotate/gmes_linux_64_4 All 6 environmental variables are set
Checking external dependencies... ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
ERROR: signalp found but error running signalp PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.2 bamtools: bamtools 2.5.2 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.515 (2023/Jan/15) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.6 proteinortho: 6.0.16 salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: pslCDnaFilter not installed ERROR: signalp not installed Singularity> signalp Can't locate Getopt/Std.pm in @inc (you may need to install the Getopt::Std module) (@inc contains: /home/u4485090/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/u4485090/perl5/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.28.1 /usr/local/share/perl/5.28.1 /usr/lib/x86_64-linux-gnu/perl5/5.28 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.28 /usr/share/perl/5.28 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /home/u4485090/funannotate/signalp-4.1/signalp line 76. BEGIN failed--compilation aborted at /home/u4485090/funannotate/signalp-4.1/signalp line 76.
Any help is appreciated! Thanks.
Cheers, Erick