funannotate
funannotate copied to clipboard
Unable to annotate genomes using funannotate ?
Hi,
I'm trying to annotate the genomes, end up in python error. Need help.
CMD-
funannotate predict -i Llips.R.fasta -s zebrafish --protein_evidence /home/sunn/data/softwares/gnom/test/spades/backup/Abininew/refgenome/tanaka.gff -o Llips --cpus 16 --force > Llips.log
(base) sunn@col:~/data/softwares/gnom/test/spades/backup/gnomecds/masked$ ./llips.sh
[Jun 28 02:21 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7
[Jun 28 02:21 AM]: Running funannotate v1.8.12
[Jun 28 02:21 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV
[Jun 28 02:21 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Jun 28 02:23 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jun 28 02:39 AM]: Genome loaded: 528,188 scaffolds; 517,884,457 bp; 8.69% repeats masked
[Jun 28 02:40 AM]: Mapping 0 proteins to genome using diamond and exonerate
[Jun 28 02:40 AM]: CMD ERROR: diamond blastx --threads 16 -q /home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive --unal 0 -c 1 -F 15 -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe
[Jun 28 02:40 AM]: diamond v2.0.5.143 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database... [0s]
Error: Incomplete database file. Database building did not complete successfully.
Traceback (most recent call last):
File "/home/sunn/anaconda3/bin/funannotate", line 33, in
sys.exit(load_entry_point('funannotate==1.8.12', 'concole_scripts', 'funannotate')())
File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/funannotate.py", line 716, in main
mod.main(arguments)
File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/predict.py", line 1053, in main
lib.exonerate2hints(Exonerate, hintsP)
File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/library.py", line 3979, in exonerate2hints
with open(file, 'r') as input:
FileNotFoundError: [Errno 2] No such file or directory: '/home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/protein_alignments.gff3'
Looks like perhaps the database setup failed, ie did you have any errors while running funannotate setup?
Not really. I installed correctly.
suggestions please.
So this line
Mapping 0 proteins to genome using diamond and exonerate
Suggests that the default SwissProt/uniprot database is not installed correctly. Try to rerun setup -i uniprot with the -f flag to overwrite
Desperate to know, how do you rank funannotate among other pipelines like BRAKER, MAKER ?
Once again ending up with the same error.
[Jun 28 04:08 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7
[Jun 28 04:08 AM]: Running funannotate v1.8.12
[Jun 28 04:08 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV
[Jun 28 04:08 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Jun 28 04:09 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jun 28 04:26 AM]: Genome loaded: 528,188 scaffolds; 517,884,457 bp; 8.69% repeats masked
[Jun 28 04:26 AM]: Mapping 0 proteins to genome using diamond and exonerate
[Jun 28 04:26 AM]: CMD ERROR: diamond blastx --threads 16 -q /home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive --unal 0 -c 1 -F 15 -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe
[Jun 28 04:26 AM]: diamond v2.0.5.143 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
#CPU threads: 16 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: Opening the database... [0s] Error: Incomplete database file. Database building did not complete successfully.
Traceback (most recent call last):
File "/home/sunn/anaconda3/bin/funannotate", line 33, in
It's the same error.
Error: Incomplete database file. Database building did not complete successfully.
This means the database is not installed properly. EBI download links were down the weekend, so possible that download was incomplete. What is output of funannotate database?
Please run the tests to verify your install.
(base) sunn@col:~/data/softwares/funannotate$ funannotate test -t all --cpus 10
#########################################################
Running funannotate clean
unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: funannotate clean
test complete.
#########################################################
#########################################################
Running funannotate mask
unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 10
#########################################################
[Jun 28 07:43 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7
[Jun 28 07:43 AM]: Running funanotate v1.8.12
[Jun 28 07:43 AM]: Missing Dependencies: tantan. Please install missing dependencies and re-run script
#########################################################
ERROR: funannotate mask
test failed.
#########################################################
#########################################################
Running funannotate predict
unit testing
Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 10 --species Awesome testicus
#########################################################
[Jun 28 07:43 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7
[Jun 28 07:43 AM]: Running funannotate v1.8.12
[Jun 28 07:43 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV
[Jun 28 07:43 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Jun 28 07:43 AM]: ERROR: augustus --proteinprofile test failed, likely a compilation error. This is required to run BUSCO, exiting.
#########################################################
Traceback (most recent call last):
File "/home/sunn/anaconda3/bin/funannotate", line 33, in
I already have a installed version of Augustus 3.4 V,
Need Augustus < 3.4 and it needs to be complied correctly so that proteinprofile is functional. Easiest way is probably to install with apt-get on Debian systems. The conda versions of Augustus are almost all broken for proteinprofile mode as of late.
I'll do that.
Can the pipeline calculate dN/dS for the given orthologs ? I'm very much interested in knowing these ratios.
Using codeml, I had a problem with my input protein CDS sequences, they had internal stop codons in some orthologs.
Suggestions please.
Is this a funannotate question? Where did the CDS files come from that you used for the selection analysis? Are you doing pairwise or multi-species analysis? You can check your data by running a translation of the CDS files first and checking which ones have the internal stops.
Jason
On Tue, Jun 28, 2022 at 2:35 AM Kevin @.***> wrote:
I'll do that.
Can the pipeline calculate dN/dS for the given orthologs ? I'm very much interested in knowing these ratios.
Using codeml, I had a problem with my input protein CDS sequences, they had internal stop codons in some orthologs.
Suggestions please.
— Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/739#issuecomment-1168290376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL5O3Y7VCALP3K3KE5YALVRKMKFANCNFSM52AOVJLQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Sent from Gmail Mobile
Jason Stajich - @.***
I did not see a followup - dN/dS is part of compare
pipeline but is for pairwise. you might need to do a check/cleanup for orthologs with stop codons and check that you are using right translation table?
you can also try a tool I wrote to wrap PAML's yn00 and general pairwise tables https://github.com/hyphaltip/subopt-kaks - use the yn00_cds_prealigned tool on already aligned CDS files (with terminal stop codon removed)