funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

Unable to annotate genomes using funannotate ?

Open sunnyEV opened this issue 2 years ago • 12 comments

Hi,

I'm trying to annotate the genomes, end up in python error. Need help.

CMD-

funannotate predict -i Llips.R.fasta -s zebrafish --protein_evidence /home/sunn/data/softwares/gnom/test/spades/backup/Abininew/refgenome/tanaka.gff -o Llips --cpus 16 --force > Llips.log

(base) sunn@col:~/data/softwares/gnom/test/spades/backup/gnomecds/masked$ ./llips.sh

[Jun 28 02:21 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7
[Jun 28 02:21 AM]: Running funannotate v1.8.12
[Jun 28 02:21 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV
[Jun 28 02:21 AM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Jun 28 02:23 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jun 28 02:39 AM]: Genome loaded: 528,188 scaffolds; 517,884,457 bp; 8.69% repeats masked
[Jun 28 02:40 AM]: Mapping 0 proteins to genome using diamond and exonerate
[Jun 28 02:40 AM]: CMD ERROR: diamond blastx --threads 16 -q /home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive --unal 0 -c 1 -F 15 -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe [Jun 28 02:40 AM]: diamond v2.0.5.143 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database... [0s]
Error: Incomplete database file. Database building did not complete successfully.

Traceback (most recent call last):
File "/home/sunn/anaconda3/bin/funannotate", line 33, in
sys.exit(load_entry_point('funannotate==1.8.12', 'concole_scripts', 'funannotate')())
File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/funannotate.py", line 716, in main
mod.main(arguments)
File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/predict.py", line 1053, in main
lib.exonerate2hints(Exonerate, hintsP)
File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/library.py", line 3979, in exonerate2hints with open(file, 'r') as input:
FileNotFoundError: [Errno 2] No such file or directory: '/home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/protein_alignments.gff3'

sunnyEV avatar Jun 28 '22 01:06 sunnyEV

Looks like perhaps the database setup failed, ie did you have any errors while running funannotate setup?

nextgenusfs avatar Jun 28 '22 01:06 nextgenusfs

Not really. I installed correctly.

sunnyEV avatar Jun 28 '22 01:06 sunnyEV

suggestions please.

sunnyEV avatar Jun 28 '22 01:06 sunnyEV

So this line

Mapping 0 proteins to genome using diamond and exonerate

Suggests that the default SwissProt/uniprot database is not installed correctly. Try to rerun setup -i uniprot with the -f flag to overwrite

nextgenusfs avatar Jun 28 '22 02:06 nextgenusfs

Desperate to know, how do you rank funannotate among other pipelines like BRAKER, MAKER ?

sunnyEV avatar Jun 28 '22 02:06 sunnyEV

Once again ending up with the same error.


[Jun 28 04:08 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7 [Jun 28 04:08 AM]: Running funannotate v1.8.12 [Jun 28 04:08 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV [Jun 28 04:08 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Jun 28 04:09 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Jun 28 04:26 AM]: Genome loaded: 528,188 scaffolds; 517,884,457 bp; 8.69% repeats masked [Jun 28 04:26 AM]: Mapping 0 proteins to genome using diamond and exonerate [Jun 28 04:26 AM]: CMD ERROR: diamond blastx --threads 16 -q /home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive --unal 0 -c 1 -F 15 -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe [Jun 28 04:26 AM]: diamond v2.0.5.143 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 16 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: Opening the database... [0s] Error: Incomplete database file. Database building did not complete successfully.

Traceback (most recent call last): File "/home/sunn/anaconda3/bin/funannotate", line 33, in sys.exit(load_entry_point('funannotate==1.8.12', 'concole_scripts', 'funannotate')()) File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/predict.py", line 1053, in main lib.exonerate2hints(Exonerate, hintsP) File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/library.py", line 3979, in exonerate2hints with open(file, 'r') as input: FileNotFoundError: [Errno 2] No such file or directory: '/home/sunn/data/softwares/gnom/test/spades/backup/gnomecds/masked/Llips/predict_misc/protein_alignments.gff3' (base) sunn@col:~/data/softwares/gnom/test/spades/backup/gnomecds/masked$

sunnyEV avatar Jun 28 '22 02:06 sunnyEV

It's the same error.

Error: Incomplete database file. Database building did not complete successfully.

This means the database is not installed properly. EBI download links were down the weekend, so possible that download was incomplete. What is output of funannotate database?

Please run the tests to verify your install.

nextgenusfs avatar Jun 28 '22 04:06 nextgenusfs

(base) sunn@col:~/data/softwares/funannotate$ funannotate test -t all --cpus 10 ######################################################### Running funannotate clean unit testing: minimap2 mediated assembly duplications Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076 CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive #########################################################

6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs

scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039

6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file ######################################################### SUCCESS: funannotate clean test complete. #########################################################

######################################################### Running funannotate mask unit testing: RepeatModeler --> RepeatMasker Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687 CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 10 #########################################################

[Jun 28 07:43 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7 [Jun 28 07:43 AM]: Running funanotate v1.8.12 [Jun 28 07:43 AM]: Missing Dependencies: tantan. Please install missing dependencies and re-run script ######################################################### ERROR: funannotate mask test failed. #########################################################

######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 10 --species Awesome testicus #########################################################

[Jun 28 07:43 AM]: OS: Ubuntu 20.04, 96 cores, ~ 1584 GB RAM. Python: 3.9.7 [Jun 28 07:43 AM]: Running funannotate v1.8.12 [Jun 28 07:43 AM]: Skipping CodingQuarry as $QUARRY_PATH not found as ENV [Jun 28 07:43 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Jun 28 07:43 AM]: ERROR: augustus --proteinprofile test failed, likely a compilation error. This is required to run BUSCO, exiting. ######################################################### Traceback (most recent call last): File "/home/sunn/anaconda3/bin/funannotate", line 33, in sys.exit(load_entry_point('funannotate==1.8.12', 'concole_scripts', 'funannotate')()) File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/test.py", line 405, in main runPredictTest(args) File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/test.py", line 160, in runPredictTest assert 1500 <= countGFFgenes(os.path.join( File "/home/sunn/anaconda3/lib/python3.9/site-packages/funannotate-1.8.12-py3.9.egg/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_edb9b175-4d46-4502-ab52-26546741e236/annotate/predict_results/Awesome_testicus.gff3' (base) sunn@col:~/data/softwares/funannotate$

sunnyEV avatar Jun 28 '22 05:06 sunnyEV

I already have a installed version of Augustus 3.4 V,

sunnyEV avatar Jun 28 '22 05:06 sunnyEV

Need Augustus < 3.4 and it needs to be complied correctly so that proteinprofile is functional. Easiest way is probably to install with apt-get on Debian systems. The conda versions of Augustus are almost all broken for proteinprofile mode as of late.

nextgenusfs avatar Jun 28 '22 06:06 nextgenusfs

I'll do that.

Can the pipeline calculate dN/dS for the given orthologs ? I'm very much interested in knowing these ratios.

Using codeml, I had a problem with my input protein CDS sequences, they had internal stop codons in some orthologs.

Suggestions please.

sunnyEV avatar Jun 28 '22 06:06 sunnyEV

Is this a funannotate question? Where did the CDS files come from that you used for the selection analysis? Are you doing pairwise or multi-species analysis? You can check your data by running a translation of the CDS files first and checking which ones have the internal stops.

Jason

On Tue, Jun 28, 2022 at 2:35 AM Kevin @.***> wrote:

I'll do that.

Can the pipeline calculate dN/dS for the given orthologs ? I'm very much interested in knowing these ratios.

Using codeml, I had a problem with my input protein CDS sequences, they had internal stop codons in some orthologs.

Suggestions please.

— Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/739#issuecomment-1168290376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL5O3Y7VCALP3K3KE5YALVRKMKFANCNFSM52AOVJLQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Sent from Gmail Mobile

Jason Stajich - @.***

hyphaltip avatar Jun 28 '22 11:06 hyphaltip

I did not see a followup - dN/dS is part of compare pipeline but is for pairwise. you might need to do a check/cleanup for orthologs with stop codons and check that you are using right translation table?

you can also try a tool I wrote to wrap PAML's yn00 and general pairwise tables https://github.com/hyphaltip/subopt-kaks - use the yn00_cds_prealigned tool on already aligned CDS files (with terminal stop codon removed)

hyphaltip avatar Oct 18 '22 21:10 hyphaltip