funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

list index out of range after Augustus

Open wushyer opened this issue 1 year ago • 21 comments

Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here. Yes

Describe the bug A clear and concise description of what the bug is.

IndexError: list index out of range after Augustus What command did you issue? Copy/paste the command used.

unannotate predict -i SG-genome.sm.fa -o funannotate -s "Scleroderma guani" --organism other --strain "SG" --cpus 10 --name Sclgu --busco_db insecta --repeats2evm --transcript_evidence assembly.transcripts.fasta --protein_alignments SG.gth.gff3 --augustus_gff augustus.hints.gff3 --max_intronlen 50000 > fun_test.1017.log

Logfiles Please provide relavent log files of the error. [Oct 17 10:22 PM]: OS: CentOS Linux 7, 152 cores, ~ 1519 GB RAM. Python: 3.7.12 [Oct 17 10:22 PM]: Running funannotate v1.8.13 [Oct 17 10:22 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 17 10:22 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 17 10:22 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco glimmerhmm busco snap busco [Oct 17 10:23 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 17 10:23 PM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked [Oct 17 10:23 PM]: Aligning transcript evidence to genome with minimap2 [Oct 17 10:24 PM]: Found 40,037 alignments, wrote GFF3 and Augustus hints to file [Oct 17 10:24 PM]: Loading protein alignments SG.gth.gff3 [Oct 17 10:24 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 17 10:47 PM]: 1,519 valid BUSCO predictions found, validating protein sequences [Oct 17 10:50 PM]: 1,517 BUSCO predictions validated [Oct 17 10:50 PM]: Pulling out high quality Augustus predictions Traceback (most recent call last): File "/users/shuangyang.wu/micromamba/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/predict.py", line 1486, in main if float(values[1]) > 89: IndexError: list index out of range

OS/Install Information

  • output of funannotate check --show-versions

Checking dependencies for 1.8.13

You are running Python v 3.7.12. Now checking python packages... biopython: 1.79 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.21.6 pandas: 1.3.5 psutil: 5.9.2 requests: 2.28.1 scikit-learn: 1.0.2 scipy: 1.7.3 seaborn: 0.12.0 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.42 DBD::SQLite: 1.70 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/groups/dolan/user/shuangyang.wu/software/funannotate_db $PASAHOME=/users/shuangyang.wu/micromamba/envs/funannotate/opt/pasa-2.5.2 $TRINITY_HOME=/users/shuangyang.wu/micromamba/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/users/shuangyang.wu/micromamba/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/users/shuangyang.wu/micromamba/envs/funannotate/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.508 (2022/Sep/07) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 proteinortho: 6.1.1 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.16.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: pigz not installed ERROR: signalp not installed ERROR: trimal not installed

wushyer avatar Oct 18 '22 07:10 wushyer

Can you check if there were any augustus predictions? It seems like there might be no high quality predictions?

50kb is also very large max intron size for fungi so I am not sure if that is causing bad predictions.

hyphaltip avatar Oct 18 '22 15:10 hyphaltip

My guess is related to Augustus 3.5.0 --- they might have changed output format or it didn't run properly. There was updates in 3.4 that I think broke some compatibility, will need to find some time to test this locally.

nextgenusfs avatar Oct 18 '22 15:10 nextgenusfs

Can you check if there were any augustus predictions? It seems like there might be no high quality predictions?

50kb is also very large max intron size for fungi so I am not sure if that is causing bad predictions.

Dear Jason,

I dont think I get Augustus result. Please see below. image

wushyer avatar Oct 18 '22 19:10 wushyer

My guess is related to Augustus 3.5.0 --- they might have changed output format or it didn't run properly. There was updates in 3.4 that I think broke some compatibility, will need to find some time to test this locally.

Thanks Jon. My species is an insect.

Best, Shuangyang

wushyer avatar Oct 18 '22 19:10 wushyer

Okay, so the issue here is that in augustus v3.5.0 cmd line options in some of the auxiliary scripts have changed, thus funannotate freezes when trying to validate the install. I think c0fab96 should fix it. Note there will likely be more issues with augustus v3.5.0 that I haven't gotten to yet......

Hi Jon,

I quickly test another version 1.8.11 funannotate with ausustus version 3.3.3, but get the same error.

[Oct 19 03:00 AM]: OS: CentOS Linux 7, 256 cores, ~ 528 GB RAM. Python: 3.7.10 [Oct 19 03:00 AM]: Running funannotate v1.8.11 [Oct 19 03:00 AM]: Skipping CodingQuarry as no --rna_bam passed [Oct 19 03:00 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco genemark selftraining glimmerhmm busco snap busco [Oct 19 03:01 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 19 03:01 AM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked [Oct 19 03:01 AM]: Aligning transcript evidence to genome with minimap2 [Oct 19 03:02 AM]: Found 40,039 alignments, wrote GFF3 and Augustus hints to file [Oct 19 03:02 AM]: Loading protein alignments SG.gth.gff3 [Oct 19 03:03 AM]: Running GeneMark-ES on assembly [Oct 19 06:01 AM]: 18,604 predictions from GeneMark [Oct 19 06:01 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 19 06:12 AM]: 218 valid BUSCO predictions found, validating protein sequences [Oct 19 06:15 AM]: 216 BUSCO predictions validated [Oct 19 06:15 AM]: Pulling out high quality Augustus predictions Traceback (most recent call last): File "/home/miniconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/predict.py", line 1485, in main if float(values[1]) > 89: IndexError: list index out of range

Checking dependencies for 1.8.11

You are running Python v 3.7.10. Now checking python packages... biopython: 1.77 goatools: 1.1.6 matplotlib: 3.4.2 natsort: 7.1.1 numpy: 1.20.3 pandas: 1.2.4 psutil: 5.8.0 requests: 2.25.1 scikit-learn: 0.24.2 scipy: 1.6.3 seaborn: 0.11.1 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 ERROR: local::lib not installed, install with cpanm local::lib

Checking Environmental Variables... $FUNANNOTATE_DB=/share/pasteur/database/funannotate/ $PASAHOME=/home/miniconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITY_HOME=/home/miniconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/home/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/miniconda3/envs/funannotate/config/ $GENEMARK_PATH=/home/software/install/genemark/gmes_linux_64/ All 6 environmental variables are set

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.15 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.65_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.480 (2021/May/21) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.20-r1061 proteinortho: 6.0.30 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.6 tRNAscan-SE: 2.0.7 (Oct 2020) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: pigz not installed

wushyer avatar Oct 19 '22 06:10 wushyer

Sorry, I think I mixed up issues. Can you try to run without the --augustus_gff augustus.hints.gff3 flag? Also please run funannotate test -t predict to validate if this is install or a problem with your data.

nextgenusfs avatar Oct 19 '22 06:10 nextgenusfs

Hi Jon, I just finish it without augustus, but with another error.

[Oct 19 02:55 PM]: OS: CentOS Linux 7, 256 cores, ~ 528 GB RAM. Python: 3.7.10
[Oct 19 02:55 PM]: Running funannotate v1.8.11
[Oct 19 02:55 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 19 02:55 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     busco
  genemark     selftraining
  glimmerhmm   busco
  snap         busco
[Oct 19 02:56 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 19 02:56 PM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked
[Oct 19 02:56 PM]: Aligning transcript evidence to genome with minimap2
[Oct 19 02:57 PM]: Found 40,039 alignments, wrote GFF3 and Augustus hints to file
[Oct 19 02:57 PM]: Loading protein alignments SG.gth.gff3
[Oct 19 02:57 PM]: Running GeneMark-ES on assembly
[Oct 19 05:46 PM]: 18,604 predictions from GeneMark
[Oct 19 05:46 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 19 05:56 PM]: 218 valid BUSCO predictions found, validating protein sequences
[Oct 19 05:58 PM]: 216 BUSCO predictions validated
[Oct 19 05:58 PM]: Training Augustus using BUSCO gene models
[Oct 19 05:58 PM]: Augustus initial training results:
  Feature       Specificity   Sensitivity
  nucleotides   91.2%         65.4%
  exons         47.2%         48.6%
  genes         27.9%         22.6%
[Oct 19 05:58 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option.
[Oct 19 05:58 PM]: Running Augustus gene prediction using scleroderma_guani_sg parameters
[Oct 19 06:24 PM]: 23,746 predictions from Augustus
[Oct 19 06:24 PM]: Pulling out high quality Augustus predictions
[Oct 19 06:24 PM]: Found 7,045 high quality predictions from Augustus (>90% exon evidence)
[Oct 19 06:24 PM]: Running SNAP gene prediction, using training data: funannotate/predict_misc/busco.final.gff3
[Oct 19 06:35 PM]: 131 predictions from SNAP
[Oct 19 06:35 PM]: Running GlimmerHMM gene prediction, using training data: funannotate/predict_misc/busco.final.gff3
[Oct 19 06:58 PM]: 33,266 predictions from GlimmerHMM
[Oct 19 06:58 PM]: Summary of gene models passed to EVM (weights):
[Oct 19 06:58 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
Traceback (most recent call last):
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 486, in <module>
    partitions=args.no_partitions)
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 138, in create_partitions
    interProteins = exonerate_blocks_to_interlap(proteins)
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 46, in exonerate_blocks_to_interlap
    coords.append(int(cols[3]))
IndexError: list index out of range
  Source         Weight   Count
  Augustus       1        16701
  Augustus HiQ   2        7045
  GeneMark       1        18604
  GlimmerHMM     1        33266
  snap           1        131
  Total          -        75747
[Oct 19 06:58 PM]: Evidence modeler has failed, exiting
Traceback (most recent call last):
  File "/home/miniconda3/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/predict.py", line 1794, in main
    os.remove(EVM_out)
FileNotFoundError: [Errno 2] No such file or directory: '/home/wusy/funannotate/funannotate/predict_misc/evm.round1.gff3'
(base) [wusy@myosin funannotate]$ cat fun_test.1017.log
-------------------------------------------------------
[Oct 19 02:55 PM]: OS: CentOS Linux 7, 256 cores, ~ 528 GB RAM. Python: 3.7.10
[Oct 19 02:55 PM]: Running funannotate v1.8.11
[Oct 19 02:55 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 19 02:55 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     busco
  genemark     selftraining
  glimmerhmm   busco
  snap         busco
[Oct 19 02:56 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 19 02:56 PM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked
[Oct 19 02:56 PM]: Aligning transcript evidence to genome with minimap2
[Oct 19 02:57 PM]: Found 40,039 alignments, wrote GFF3 and Augustus hints to file
[Oct 19 02:57 PM]: Loading protein alignments SG.gth.gff3
[Oct 19 02:57 PM]: Running GeneMark-ES on assembly
[Oct 19 05:46 PM]: 18,604 predictions from GeneMark
[Oct 19 05:46 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 19 05:56 PM]: 218 valid BUSCO predictions found, validating protein sequences
[Oct 19 05:58 PM]: 216 BUSCO predictions validated
[Oct 19 05:58 PM]: Training Augustus using BUSCO gene models
[Oct 19 05:58 PM]: Augustus initial training results:
  Feature       Specificity   Sensitivity
  nucleotides   91.2%         65.4%
  exons         47.2%         48.6%
  genes         27.9%         22.6%
[Oct 19 05:58 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option.
[Oct 19 05:58 PM]: Running Augustus gene prediction using scleroderma_guani_sg parameters
[Oct 19 06:24 PM]: 23,746 predictions from Augustus
[Oct 19 06:24 PM]: Pulling out high quality Augustus predictions
[Oct 19 06:24 PM]: Found 7,045 high quality predictions from Augustus (>90% exon evidence)
[Oct 19 06:24 PM]: Running SNAP gene prediction, using training data: funannotate/predict_misc/busco.final.gff3
[Oct 19 06:35 PM]: 131 predictions from SNAP
[Oct 19 06:35 PM]: Running GlimmerHMM gene prediction, using training data: funannotate/predict_misc/busco.final.gff3
[Oct 19 06:58 PM]: 33,266 predictions from GlimmerHMM
[Oct 19 06:58 PM]: Summary of gene models passed to EVM (weights):
[Oct 19 06:58 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
Traceback (most recent call last):
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 486, in <module>
    partitions=args.no_partitions)
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 138, in create_partitions
    interProteins = exonerate_blocks_to_interlap(proteins)
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 46, in exonerate_blocks_to_interlap
    coords.append(int(cols[3]))
IndexError: list index out of range
  Source         Weight   Count
  Augustus       1        16701
  Augustus HiQ   2        7045
  GeneMark       1        18604
  GlimmerHMM     1        33266
  snap           1        131
  Total          -        75747
[Oct 19 06:58 PM]: Evidence modeler has failed, exiting
Traceback (most recent call last):
  File "/home/miniconda3/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/home/miniconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/predict.py", line 1794, in main
    os.remove(EVM_out)
FileNotFoundError: [Errno 2] No such file or directory: '/home/wusy/funannotate/funannotate/predict_misc/evm.round1.gff3'

wushyer avatar Oct 19 '22 15:10 wushyer

Please run the tests to test your installation. It looks like snap is not installed properly.

nextgenusfs avatar Oct 19 '22 15:10 nextgenusfs

Please run the tests to test your installation. It looks like snap is not installed properly.

Hi Jon,

The test pipeline finished without any error. Best, Shuangyang

######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 2 --species Awesome testicus #########################################################

[Oct 20 12:04 AM]: OS: CentOS Linux 7, 256 cores, ~ 528 GB RAM. Python: 3.7.10 [Oct 20 12:04 AM]: Running funannotate v1.8.11 [Oct 20 12:04 AM]: Skipping CodingQuarry as no --rna_bam passed [Oct 20 12:04 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained genemark selftraining glimmerhmm busco snap busco [Oct 20 12:04 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 20 12:04 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Oct 20 12:04 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Oct 20 12:04 AM]: Found 1,505 preliminary alignments with diamond in 0:00:05 --> generated FASTA files for exonerate in 0:00:00 [Oct 20 12:05 AM]: Exonerate finished in 0:00:52: found 1,270 alignments [Oct 20 12:05 AM]: Running GeneMark-ES on assembly [Oct 20 12:08 AM]: 1,593 predictions from GeneMark [Oct 20 12:08 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 20 12:25 AM]: 373 valid BUSCO predictions found, validating protein sequences [Oct 20 12:27 AM]: 370 BUSCO predictions validated [Oct 20 12:27 AM]: Running Augustus gene prediction using saccharomyces parameters [Oct 20 12:30 AM]: 1,485 predictions from Augustus [Oct 20 12:30 AM]: Pulling out high quality Augustus predictions [Oct 20 12:30 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence) [Oct 20 12:30 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Oct 20 12:30 AM]: 0 predictions from SNAP [Oct 20 12:30 AM]: SNAP prediction failed, moving on without result [Oct 20 12:30 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Oct 20 12:31 AM]: 1,773 predictions from GlimmerHMM [Oct 20 12:31 AM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 1325 Augustus HiQ 2 372 GeneMark 1 1593 GlimmerHMM 1 1773 Total - 5063 [Oct 20 12:31 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Oct 20 12:36 AM]: Converting to GFF3 and collecting all EVM results [Oct 20 12:36 AM]: 1,706 total gene models from EVM [Oct 20 12:36 AM]: Generating protein fasta files from 1,706 EVM models [Oct 20 12:36 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Oct 20 12:37 AM]: Found 100 gene models to remove: 0 too short; 0 span gaps; 100 transposable elements [Oct 20 12:37 AM]: 1,606 gene models remaining [Oct 20 12:37 AM]: Predicting tRNAs [Oct 20 12:37 AM]: 112 tRNAscan models are valid (non-overlapping) [Oct 20 12:37 AM]: Generating GenBank tbl annotation file [Oct 20 12:37 AM]: Collecting final annotation files for 1,718 total gene models [Oct 20 12:37 AM]: Converting to final Genbank format [Oct 20 12:37 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [Oct 20 12:37 AM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (manual install): funannotate iprscan -i annotate -c 2

Run antiSMASH (optional): funannotate remote -i annotate -m antismash -e [email protected]

Annotate Genome: funannotate annotate -i annotate --cpus 2 --sbt yourSBTfile.txt

[Oct 20 12:37 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json [Oct 20 12:37 AM]: Add species parameters to database:

funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

######################################################### SUCCESS: funannotate predict test complete. #########################################################

(funannotate) [wusy@myosin ~]$

wushyer avatar Oct 19 '22 16:10 wushyer

snap install via conda must be corrupt:

[Oct 20 12:30 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 20 12:30 AM]: 0 predictions from SNAP
[Oct 20 12:30 AM]: SNAP prediction failed, moving on without result

You will have to compile snap manually and remove this version from conda in order to fix.

The other issue is that BUSCO results seem to be incomplete. I don't know the reason behind this.

But that doesn't necessarily explain the error you got above.

I do not have time to trouble shoot any old versions. Can you upgrade to the current master and re-run. I also need to see the command that you are running. And please run in a fresh/new output directory so existing files are not re-used after upgrading to latest in master (v1.8.14).

nextgenusfs avatar Oct 19 '22 17:10 nextgenusfs

snap install via conda must be corrupt:

[Oct 20 12:30 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 20 12:30 AM]: 0 predictions from SNAP
[Oct 20 12:30 AM]: SNAP prediction failed, moving on without result

You will have to compile snap manually and remove this version from conda in order to fix.

The other issue is that BUSCO results seem to be incomplete. I don't know the reason behind this.

But that doesn't necessarily explain the error you got above.

I do not have time to trouble shoot any old versions. Can you upgrade to the current master and re-run. I also need to see the command that you are running. And please run in a fresh/new output directory so existing files are not re-used after upgrading to latest in master (v1.8.14).

Thanks Jon, I will do this and let you know. Best, Shuangyang

wushyer avatar Oct 19 '22 17:10 wushyer

funannotate test -t predict

Hi, test is fine. I will launch my data then.

######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 2 --species Awesome testicus #########################################################

[Oct 19 07:57 PM]: OS: CentOS Linux 7, 76 cores, ~ 177 GB RAM. Python: 3.7.12 [Oct 19 07:57 PM]: Running funannotate v1.8.14 [Oct 19 07:57 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 19 07:57 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 19 07:57 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained glimmerhmm busco snap busco [Oct 19 07:57 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 19 07:57 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Oct 19 07:57 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [Oct 19 07:57 PM]: Found 1,505 preliminary alignments with diamond in 0:00:04 --> generated FASTA files for exonerate in 0:00:00 [Oct 19 07:57 PM]: Exonerate finished in 0:00:42: found 1,270 alignments [Oct 19 07:57 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 19 08:18 PM]: 370 valid BUSCO predictions found, validating protein sequences [Oct 19 08:19 PM]: 367 BUSCO predictions validated [Oct 19 08:19 PM]: Running Augustus gene prediction using saccharomyces parameters [Oct 19 08:23 PM]: 1,485 predictions from Augustus [Oct 19 08:23 PM]: Pulling out high quality Augustus predictions [Oct 19 08:23 PM]: Found 371 high quality predictions from Augustus (>90% exon evidence) [Oct 19 08:23 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Oct 19 08:23 PM]: 1,543 predictions from SNAP [Oct 19 08:23 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Oct 19 08:25 PM]: 1,768 predictions from GlimmerHMM [Oct 19 08:25 PM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 1325 Augustus HiQ 2 372 GlimmerHMM 1 1768 snap 1 1543 Total - 5008 [Oct 19 08:25 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Oct 19 08:37 PM]: Converting to GFF3 and collecting all EVM results [Oct 19 08:37 PM]: 1,696 total gene models from EVM [Oct 19 08:37 PM]: Generating protein fasta files from 1,696 EVM models [Oct 19 08:37 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Oct 19 08:38 PM]: Found 133 gene models to remove: 0 too short; 0 span gaps; 133 transposable elements [Oct 19 08:38 PM]: 1,563 gene models remaining [Oct 19 08:38 PM]: Predicting tRNAs [Oct 19 08:38 PM]: 112 tRNAscan models are valid (non-overlapping) [Oct 19 08:38 PM]: Generating GenBank tbl annotation file [Oct 19 08:38 PM]: Collecting final annotation files for 1,675 total gene models [Oct 19 08:38 PM]: Converting to final Genbank format [Oct 19 08:38 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [Oct 19 08:38 PM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (manual install): funannotate iprscan -i annotate -c 2

Run antiSMASH (optional): funannotate remote -i annotate -m antismash -e [email protected]

Annotate Genome: funannotate annotate -i annotate --cpus 2 --sbt yourSBTfile.txt

[Oct 19 08:38 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json [Oct 19 08:38 PM]: Add species parameters to database:

funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

######################################################### SUCCESS: funannotate predict test complete. #########################################################

wushyer avatar Oct 19 '22 18:10 wushyer

Hi Jon,

I have tested 3 method, only run with transcript evidence works, please see the log below.

command line: funannotate predict -i SG-genome.sm.fa -o funannotate -s "Scleroderma" --organism other --strain "Sg" --cpus 10 --name Sclgu --busco_db insecta --repeats2evm --transcript_evidence assembly.transcripts.fasta --max_intronlen 20000 > fun_test.1017.log

log: [Oct 19 08:50 PM]: OS: CentOS Linux 7, 76 cores, ~ 177 GB RAM. Python: 3.7.12 [Oct 19 08:50 PM]: Running funannotate v1.8.14 [Oct 19 08:50 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 19 08:50 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 19 08:50 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco glimmerhmm busco snap busco [Oct 19 08:51 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 19 08:51 PM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked [Oct 19 08:51 PM]: Aligning transcript evidence to genome with minimap2 [Oct 19 08:52 PM]: Found 40,039 alignments, wrote GFF3 and Augustus hints to file [Oct 19 08:52 PM]: Mapping 555,555 proteins to genome using diamond and exonerate [Oct 19 09:02 PM]: Found 403,144 preliminary alignments with diamond in 0:04:51 --> generated FASTA files for exonerate in 0:05:47 [Oct 19 10:04 PM]: Exonerate finished in 0:57:30: found 5,869 alignments [Oct 19 10:07 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 19 10:29 PM]: 1,519 valid BUSCO predictions found, validating protein sequences [Oct 19 10:31 PM]: 1,517 BUSCO predictions validated [Oct 19 10:31 PM]: Training Augustus using BUSCO gene models [Oct 19 10:32 PM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 97.6% 84.9% exons 68.7% 68.7% genes 37.5% 32.0% [Oct 19 10:32 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option. [Oct 19 10:32 PM]: Running Augustus gene prediction using scleroderma_sg parameters [Oct 19 11:08 PM]: 19,649 predictions from Augustus [Oct 19 11:08 PM]: Pulling out high quality Augustus predictions [Oct 19 11:08 PM]: Found 7,885 high quality predictions from Augustus (>90% exon evidence) [Oct 19 11:08 PM]: Running SNAP gene prediction, using training data: funannotate/predict_misc/busco.final.gff3 [Oct 19 11:23 PM]: 29,074 predictions from SNAP [Oct 19 11:23 PM]: Running GlimmerHMM gene prediction, using training data: funannotate/predict_misc/busco.final.gff3 [Oct 19 11:55 PM]: 37,437 predictions from GlimmerHMM [Oct 19 11:55 PM]: Summary of gene models passed to EVM (weights): [Oct 19 11:56 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Oct 20 12:33 AM]: Converting to GFF3 and collecting all EVM results Source Weight Count Augustus 1 11764 Augustus HiQ 2 7885 GlimmerHMM 1 37437 snap 1 29074 Total - 86160 [Oct 20 12:33 AM]: 18,219 total gene models from EVM [Oct 20 12:33 AM]: Generating protein fasta files from 18,219 EVM models [Oct 20 12:33 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Oct 20 12:34 AM]: Found 1,747 gene models to remove: 10 too short; 0 span gaps; 1,737 transposable elements [Oct 20 12:34 AM]: 16,472 gene models remaining [Oct 20 12:34 AM]: Predicting tRNAs [Oct 20 12:36 AM]: 161 tRNAscan models are valid (non-overlapping) [Oct 20 12:36 AM]: Generating GenBank tbl annotation file [Oct 20 12:37 AM]: Collecting final annotation files for 16,633 total gene models [Oct 20 12:37 AM]: Converting to final Genbank format [Oct 20 12:39 AM]: Funannotate predict is finished, output files are in the funannotate/predict_results folder [Oct 20 12:39 AM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (manual install): funannotate iprscan -i funannotate -c 10

Run antiSMASH (optional): funannotate remote -i funannotate -m antismash -e [email protected]

Annotate Genome: funannotate annotate -i funannotate --cpus 10 --sbt yourSBTfile.txt

[Oct 20 12:39 AM]: Training parameters file saved: funannotate/predict_results/scleroderma_sg.parameters.json [Oct 20 12:39 AM]: Add species parameters to database:

funannotate species -s scleroderma_sg -a funannotate/predict_results/scleroderma_sg.parameters.json

command line : funannotate predict -i SG-genome.sm.fa -o funannotate -s "Scleroderm_aguani" --organism other --strain "SG3" --cpus 10 --name Sclgu --busco_db insecta --repeats2evm --transcript_evidence assembly.transcripts.fasta --protein_alignments SG.gth.gff3 --augustus_gff augustus.hints.gff3 --max_intronlen 20000 > fun_test.1017.log

[Oct 20 09:23 PM]: OS: CentOS Linux 7, 76 cores, ~ 177 GB RAM. Python: 3.7.12 [Oct 20 09:23 PM]: Running funannotate v1.8.14 [Oct 20 09:23 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 20 09:24 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 20 09:24 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco glimmerhmm busco snap busco [Oct 20 09:24 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 20 09:24 PM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked [Oct 20 09:24 PM]: Aligning transcript evidence to genome with minimap2 [Oct 20 09:25 PM]: Found 40,039 alignments, wrote GFF3 and Augustus hints to file [Oct 20 09:25 PM]: Loading protein alignments SG.gth.gff3 [Oct 20 09:26 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 20 09:50 PM]: 1,519 valid BUSCO predictions found, validating protein sequences [Oct 20 09:52 PM]: 1,517 BUSCO predictions validated [Oct 20 09:52 PM]: Pulling out high quality Augustus predictions Traceback (most recent call last): File "/users/shuangyang.wu/micromamba/envs/funannotate/bin/funannotate", line 8, in sys.exit(main()) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/predict.py", line 1489, in main if float(values[1]) > 89: IndexError: list index out of range

command line :funannotate predict -i SG-genome.sm.fa -o funannotate -s "Sclerodermaguani" --organism other --strain "SG4" --cpus 10 --name Sclgu --busco_db insecta --repeats2evm --transcript_evidence assembly.transcripts.fasta --protein_alignments SG.gth.gff3 --max_intronlen 20000 > fun_test.1017.log

[Oct 20 10:00 PM]: OS: CentOS Linux 7, 76 cores, ~ 177 GB RAM. Python: 3.7.12 [Oct 20 10:00 PM]: Running funannotate v1.8.14 [Oct 20 10:00 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 20 10:00 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 20 10:00 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco glimmerhmm busco snap busco [Oct 20 10:00 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 20 10:01 PM]: Genome loaded: 231 scaffolds; 191,968,538 bp; 25.41% repeats masked [Oct 20 10:01 PM]: Aligning transcript evidence to genome with minimap2 [Oct 20 10:01 PM]: Found 40,039 alignments, wrote GFF3 and Augustus hints to file [Oct 20 10:01 PM]: Loading protein alignments SG.gth.gff3 [Oct 20 10:02 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 20 10:29 PM]: 1,519 valid BUSCO predictions found, validating protein sequences [Oct 20 10:32 PM]: 1,517 BUSCO predictions validated [Oct 20 10:32 PM]: Training Augustus using BUSCO gene models [Oct 20 10:32 PM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 97.6% 84.9% exons 68.7% 68.7% genes 37.5% 32.0% [Oct 20 10:32 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option. [Oct 20 10:32 PM]: Running Augustus gene prediction using sclerodermaguani_sg4 parameters [Oct 20 11:12 PM]: 21,981 predictions from Augustus [Oct 20 11:12 PM]: Pulling out high quality Augustus predictions [Oct 20 11:12 PM]: Found 7,823 high quality predictions from Augustus (>90% exon evidence) [Oct 20 11:12 PM]: Running SNAP gene prediction, using training data: funannotate/predict_misc/busco.final.gff3 [Oct 20 11:25 PM]: 29,073 predictions from SNAP [Oct 20 11:25 PM]: Running GlimmerHMM gene prediction, using training data: funannotate/predict_misc/busco.final.gff3 [Oct 20 11:54 PM]: 37,344 predictions from GlimmerHMM [Oct 20 11:54 PM]: Summary of gene models passed to EVM (weights): [Oct 20 11:54 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval Traceback (most recent call last): File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 486, in partitions=args.no_partitions) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 138, in create_partitions interProteins = exonerate_blocks_to_interlap(proteins) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 46, in exonerate_blocks_to_interlap coords.append(int(cols[3])) IndexError: list index out of range Source Weight Count Augustus 1 14158 Augustus HiQ 2 7823 GlimmerHMM 1 37344 snap 1 29073 Total - 88398 [Oct 20 11:54 PM]: Evidence modeler has failed, exiting Traceback (most recent call last): File "/users/shuangyang.wu/micromamba/envs/funannotate/bin/funannotate", line 8, in sys.exit(main()) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/users/shuangyang.wu/micromamba/envs/funannotate/lib/python3.7/site-packages/funannotate/predict.py", line 1798, in main os.remove(EVM_out) FileNotFoundError: [Errno 2] No such file or directory: '/groups/dolan/user/shuangyang.wu/xf/curated-all/1015/funannotate/predict_misc/evm.round1.gff3'

wushyer avatar Oct 21 '22 05:10 wushyer

Hi, I am getting the same error message,

[Oct 13 12:29 AM]: Parsing GFF pass-through: /home/lifesci/lfrwtp/gene_annotaiton/braker3_out/braker.gff3 --> setting source to other_pred1 [Oct 13 08:19 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 13 08:20 AM]: Genome loaded: 268 scaffolds; 524,302,261 bp; 57.61% repeats masked [Oct 13 08:20 AM]: Parsed 548,780 transcript alignments from: /scratch/lifesci/lfrwtp/funannotate_wildb/funannotate_sample.out/training/pasa/sample_pasa.assemblies.fasta.transdecoder.gff3 [Oct 13 08:20 AM]: Aligning 140,371 unique transcripts [not found in exising alignments] with minimap2 [Oct 13 08:21 AM]: Mapped 135,093 of these transcripts to the genome, adding to alignments [Oct 13 08:21 AM]: Creating transcript EVM alignments and Augustus transcripts hintsfile [Oct 13 08:21 AM]: Existing RNA-seq BAM hints found: funannotate_sample.out/predict_misc/hints.BAM.gff [Oct 13 08:21 AM]: Existing protein alignments found: funannotate_sample.out/predict_misc/protein_alignments.gff3 [Oct 13 08:22 AM]: Filtering PASA data for suitable training set [Oct 13 08:23 AM]: 5,217 of 35,772 models pass training parameters [Oct 13 08:23 AM]: Pulling out high quality Augustus predictions Traceback (most recent call last): File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 2185, in main if float(values[1]) > 89: IndexError: list index out of range

Any suggesion please?

sadikmz avatar Oct 13 '23 11:10 sadikmz

Hi, I am getting the same error message,

[Oct 13 12:29 AM]: Parsing GFF pass-through: /home/lifesci/lfrwtp/gene_annotaiton/braker3_out/braker.gff3 --> setting source to other_pred1 [Oct 13 08:19 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 13 08:20 AM]: Genome loaded: 268 scaffolds; 524,302,261 bp; 57.61% repeats masked [Oct 13 08:20 AM]: Parsed 548,780 transcript alignments from: /scratch/lifesci/lfrwtp/funannotate_wildb/funannotate_sample.out/training/pasa/sample_pasa.assemblies.fasta.transdecoder.gff3 [Oct 13 08:20 AM]: Aligning 140,371 unique transcripts [not found in exising alignments] with minimap2 [Oct 13 08:21 AM]: Mapped 135,093 of these transcripts to the genome, adding to alignments [Oct 13 08:21 AM]: Creating transcript EVM alignments and Augustus transcripts hintsfile [Oct 13 08:21 AM]: Existing RNA-seq BAM hints found: funannotate_sample.out/predict_misc/hints.BAM.gff [Oct 13 08:21 AM]: Existing protein alignments found: funannotate_sample.out/predict_misc/protein_alignments.gff3 [Oct 13 08:22 AM]: Filtering PASA data for suitable training set [Oct 13 08:23 AM]: 5,217 of 35,772 models pass training parameters [Oct 13 08:23 AM]: Pulling out high quality Augustus predictions Traceback (most recent call last): File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 2185, in main if float(values[1]) > 89: IndexError: list index out of range

Any suggesion please?

Hi, I solved the problem by using Funannotate starting from fastq files. I don't run the independent tools now. You can test it from the beginning.

Best, Shuangyang

wushyer avatar Oct 13 '23 11:10 wushyer

Thanks a lot.

I am using Braker, Augustus, and genemark as external input not sure to figure out what's causing the issue.

sadikmz avatar Oct 13 '23 11:10 sadikmz

@nextgenusfs any suggestion please?

I tested this with multiple genotypes (with Augustus and braker hints from BRAKER3) and it's terminating with "IndexError: list index out of range" error at stage where funanotate is extracting greater than ~90% of exons support.

https://github.com/nextgenusfs/funannotate/blob/7d9496f97b4e642260b077508ab906588145c398/funannotate/predict.py#L2184-L2186

Appreciate any suggestions.

sadikmu avatar Oct 16 '23 14:10 sadikmu

I rerun it without --augustus_gff but EVM failed to combine hints.

[Oct 16 11:42 PM]: 138,994 predictions from GlimmerHMM
[Oct 16 11:42 PM]: Summary of gene models passed to EVM (weights):
[Oct 16 11:42 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 16 11:43 PM]: Converting to GFF3 and collecting all EVM results
  ESC[4mSource         Weight   Count ESC[0m
  Augustus       4        21705 
  Augustus HiQ   2        18144 
  GeneMark       3        46174 
  GlimmerHMM     1        138994
  other_pred1    7        34016 
  pasa           10       37319 
  snap           2        135378
  Total          -        431730
[Oct 16 11:43 PM]: Evidence modeler has failed, exiting

sadikmu avatar Oct 16 '23 23:10 sadikmu

The error is related to an Augustus format that it unable to parse for the "hi-q" genes. It is expecting raw augustus output run with these parameters --stopCodonExcludedFromCDS=False --gff3=on --UTR=off --hintsfile=/path/to/hints. It has never worked well with Braker input, I think it worked with BRAKER2 output but I have not kept up with how they've changed the formats in more recent version (nor will I).

So as mentioned above, if you let funannotate run augustus you shouldn't have a problem. Keep in mind that it will re-use existing files if they are present, so if you re-ran with the same output folder/directory it might be re-using the old data and causing some issues. If other gff3 input isn't being parsed properly, you can run through https://github.com/nextgenusfs/gfftk (which is the old GFF3 parsing code pulled out of funannotate and updated, it has a gfftk sanitize script that should help).

nextgenusfs avatar Oct 16 '23 23:10 nextgenusfs

And I should add, please ensure your installation works properly by running funannotate test.

nextgenusfs avatar Oct 16 '23 23:10 nextgenusfs

In my last post Augusts was run from augustus I did remove the external input --augustus_gff. Here is tail of stdout

[Oct 17 09:03 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 17 09:03 AM]: Found 1,505 preliminary alignments with diamond in 0:00:00 --> generated FASTA files for exonerate in 0:00:00
[Oct 17 09:03 AM]: Exonerate finished in 0:00:10: found 1,270 alignments
[Oct 17 09:03 AM]: Running GeneMark-ES on assembly
[Oct 17 09:05 AM]: 1,562 predictions from GeneMark
[Oct 17 09:05 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 17 09:06 AM]: 370 valid BUSCO predictions found, validating protein sequences
[Oct 17 09:07 AM]: 212 BUSCO predictions validated
[Oct 17 09:07 AM]: Running Augustus gene prediction using saccharomyces parameters
[Oct 17 09:08 AM]: 1,485 predictions from Augustus
[Oct 17 09:08 AM]: Pulling out high quality Augustus predictions
[Oct 17 09:08 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Oct 17 09:08 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 17 09:08 AM]: 0 predictions from SNAP
[Oct 17 09:08 AM]: SNAP prediction failed, moving on without result
[Oct 17 09:08 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 17 09:08 AM]: 537 predictions from GlimmerHMM
[Oct 17 09:08 AM]: Summary of gene models passed to EVM (weights):
[Oct 17 09:08 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 17 09:08 AM]: Converting to GFF3 and collecting all EVM results
  ESC[4mSource         Weight   CountESC[0m
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GeneMark       1        1562 
  GlimmerHMM     1        537  
  Total          -        3796 
[Oct 17 09:08 AM]: Evidence modeler has failed, exiting
Traceback (most recent call last):
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 405, in main
    runPredictTest(args)
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 160, in runPredictTest
    assert 1500 <= countGFFgenes(os.path.join(
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_8371aa7a-8cfb-4ad5-84db-e65c70505c73/annotate/predict_results/Awesome_testicus.gff3'

And funannotate test predict

[Oct 17 09:03 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 17 09:03 AM]: Found 1,505 preliminary alignments with diamond in 0:00:00 --> generated FASTA files for exonerate in 0:00:00
[Oct 17 09:03 AM]: Exonerate finished in 0:00:10: found 1,270 alignments
[Oct 17 09:03 AM]: Running GeneMark-ES on assembly
[Oct 17 09:05 AM]: 1,562 predictions from GeneMark
[Oct 17 09:05 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 17 09:06 AM]: 370 valid BUSCO predictions found, validating protein sequences
[Oct 17 09:07 AM]: 212 BUSCO predictions validated
[Oct 17 09:07 AM]: Running Augustus gene prediction using saccharomyces parameters
[Oct 17 09:08 AM]: 1,485 predictions from Augustus
[Oct 17 09:08 AM]: Pulling out high quality Augustus predictions
[Oct 17 09:08 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Oct 17 09:08 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 17 09:08 AM]: 0 predictions from SNAP
[Oct 17 09:08 AM]: SNAP prediction failed, moving on without result
[Oct 17 09:08 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 17 09:08 AM]: 537 predictions from GlimmerHMM
[Oct 17 09:08 AM]: Summary of gene models passed to EVM (weights):
[Oct 17 09:08 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 17 09:08 AM]: Converting to GFF3 and collecting all EVM results
  ESC[4mSource         Weight   CountESC[0m
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GeneMark       1        1562 
  GlimmerHMM     1        537  
  Total          -        3796 
[Oct 17 09:08 AM]: Evidence modeler has failed, exiting
Traceback (most recent call last):
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 405, in main
    runPredictTest(args)
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 160, in runPredictTest
    assert 1500 <= countGFFgenes(os.path.join(
  File "/home/lifesci/lfrwtp/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_8371aa7a-8cfb-4ad5-84db-e65c70505c73/annotate/predict_results/Awesome_testicus.gff3'

Both appears to be not properly parsing the input test file (line 45 test.py) and then Awesome_testicus.gff3 is missing in both cases.

sadikmu avatar Oct 17 '23 08:10 sadikmu