funannotate Predict log suggests that it ran ok, but there is only a single CDS repeated ad nauseum in .gbk

Are you using the latest release? funannotate v1.8.12

Describe the bug The logfile for funannotate predict seemed to show that everything ran ok. But the outputs (.protein.fa and .gff) contain only a single sequence, repeated thousands of times in the case of the .gbk file.

What command did you issue? funannotate predict \ -i ${genome}_contigs_nuc_clean_sorted_masked.fasta \ -o annotated_${today} \ -s $species \ --strain $genome \ --optimize_augustus --busco_seed_species anidulans --busco_db ascomycota --organism fungus \ --SeqCenter Duke --SeqAccession $biosample --name $locus_tag \ --cpus 12 \ --no-progress

Logfiles

[08/09/22 18:08:18]: /hpc/home/idm7/miniconda3/envs/funannotate/bin/funannotate predict -i 11164_contigs_nuc_clean_sorted_masked.fasta -o annotated_2022-08-09 -s Neophaeomoniellasp11164 --strain 11164 --optimize_augustus --busco_seed_species anidulans --busco_db ascomycota --organism fungus --SeqCenter Duke --SeqAccession SAMN30032824 --name NUJ72
 --cpus 12 --no-progress

[08/09/22 18:08:19]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12
[08/09/22 18:08:19]: Running funannotate v1.8.12
[08/09/22 18:08:19]: GeneMark path: /hpc/group/bio1/ian/envs/funannotate/gmes_linux_64_4
[08/09/22 18:08:22]: Full path to gmes_petap.pl: /hpc/group/bio1/ian/envs/funannotate/gmes_linux_64_4/gmes_petap.pl
[08/09/22 18:08:22]: GeneMark appears to be functional? True
[08/09/22 18:08:23]: exonerate version=exonerate 2.4.0 path=/hpc/home/idm7/miniconda3/envs/funannotate/bin/exonerate
[08/09/22 18:08:23]: diamond version=2.0.15 path=/hpc/home/idm7/miniconda3/envs/funannotate/bin/diamond
[08/09/22 18:08:23]: tbl2asn version=no way to determine, likely 25.X path=/hpc/home/idm7/miniconda3/envs/funannotate/bin/tbl2asn
[08/09/22 18:08:23]: bedtools version=bedtools v2.30.0 path=/hpc/home/idm7/miniconda3/envs/funannotate/bin/bedtools
[08/09/22 18:08:23]: augustus version=3.3.3 path=/opt/apps/rhel7/augustus-3.3.3/bin/augustus
[08/09/22 18:08:23]: etraining version=NA path=/opt/apps/rhel7/augustus-3.3.3/bin/etraining
[08/09/22 18:08:23]: tRNAscan-SE version=2.0.9 (July 2021) path=/hpc/home/idm7/miniconda3/envs/funannotate/bin/tRNAscan-SE
[08/09/22 18:08:23]: bam2hints version=NA path=/opt/apps/rhel7/augustus-3.3.3/bin/bam2hints
[08/09/22 18:08:23]: minimap2 version=2.24-r1122 path=/hpc/home/idm7/miniconda3/envs/funannotate/bin/minimap2
[08/09/22 18:08:23]: $AUGUSTUS_CONFIG_PATH=/hpc/home/idm7/miniconda3/envs/funannotate/config/
[08/09/22 18:08:27]: {'augustus': 1, 'hiq': 2, 'genemark': 1, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[08/09/22 18:08:27]: Skipping CodingQuarry as no --rna_bam passed
[08/09/22 18:08:27]: {'augustus': 'busco', 'genemark': 'selftraining', 'snap': 'busco', 'glimmerhmm': 'busco'}
[08/09/22 18:08:27]: Parsed training data, run ab-initio gene predictors as follows:
[08/09/22 18:08:29]: {'augustus': 1, 'hiq': 2, 'genemark': 1, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[08/09/22 18:08:36]: Loading genome assembly and parsing soft-masked repetitive sequences
[08/09/22 18:08:45]: Genome loaded: 145 scaffolds; 28,632,561 bp; 1.25% repeats masked
[08/09/22 19:54:41]: join_mult_hints.pl
[08/09/22 19:54:41]: Running GeneMark-ES on assembly
[08/09/22 19:54:41]: /hpc/group/bio1/ian/envs/funannotate/gmes_linux_64_4/gmes_petap.pl --ES --max_intron 3000 --soft_mask 2000 --cores 12 --sequence /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa --fungus
[08/09/22 20:12:02]: perl /hpc/home/idm7/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl annotated_2022-08-09/predict_misc/genemark.gff
[08/09/22 20:12:05]: 10,642 predictions from GeneMark
[08/09/22 20:12:05]: Running BUSCO to find conserved gene models for training ab-initio predictors
[08/09/22 20:12:05]: /hpc/home/idm7/miniconda3/envs/funannotate/bin/python /hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa -m genome --lineage /hpc/home/idm7/miniconda3/envs/databases/funannotate_db/ascomycota -o neophaeomoniellasp11164_11164 -c 12 --species anidulans -f --local_augustus /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/ab_initio_parameters/augustus
[08/09/22 20:29:57]: 1,260 valid BUSCO predictions found, validating protein sequences
[08/09/22 20:33:51]: 1,257 BUSCO predictions validated
[08/09/22 20:33:51]: Training Augustus using BUSCO gene models
[08/09/22 20:33:51]: gff2gbSmallDNA.pl annotated_2022-08-09/predict_misc/busco.final.gff3 /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa 600 annotated_2022-08-09/predict_misc/augustus.training.busco.gb
[08/09/22 20:34:28]: Augustus initial training results:
[08/09/22 21:07:36]: Augustus optimized training results:
[08/09/22 21:07:36]: Running Augustus gene prediction using neophaeomoniellasp11164_11164 parameters
[08/09/22 21:11:24]: perl /hpc/home/idm7/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl annotated_2022-08-09/predict_misc/augustus.gff3
[08/09/22 21:11:25]: Pulling out high quality Augustus predictions
[08/09/22 21:11:26]: Found 35 high quality predictions from Augustus (>90% exon evidence)
[08/09/22 21:11:26]: Running SNAP gene prediction, using training data: annotated_2022-08-09/predict_misc/busco.final.gff3
[08/09/22 21:11:28]: 1257 gene models to train snap on 120 scaffolds
[08/09/22 21:11:29]: fathom /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/snap.training.zff /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/snap-training.scaffolds.fasta -categorize 1000 -min-intron 10 -max-intron 3000
[08/09/22 21:11:31]: fathom uni.ann uni.dna -export 1000 -plus
[08/09/22 21:11:31]: forge export.ann export.dna
[08/09/22 21:11:33]: perl /hpc/home/idm7/miniconda3/envs/funannotate/bin/hmm-assembler.pl snap-trained annotated_2022-08-09/predict_misc/snaptrain
[08/09/22 21:11:34]: snap /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/snap-trained.hmm /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa
[08/09/22 21:14:07]: 9,920 predictions from SNAP
[08/09/22 21:14:07]: Running GlimmerHMM gene prediction, using training data: annotated_2022-08-09/predict_misc/busco.final.gff3
[08/09/22 21:14:08]: trainGlimmerHMM /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/glimmer.exons -d annotated_2022-08-09/predict_misc/glimmerhmm
[08/09/22 21:26:14]: perl /hpc/home/idm7/miniconda3/envs/funannotate/bin/glimmhmm.pl /hpc/home/idm7/miniconda3/envs/funannotate/bin/glimmerhmm /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/glimmerhmm -g
[08/09/22 21:29:59]: 10,291 predictions from GlimmerHMM
[08/09/22 21:30:00]: Prediction sources: ['Augustus', 'HiQ', 'GeneMark', 'GlimmerHMM', 'snap']
[08/09/22 21:30:01]: Summary of gene models: {'total': 39379, 'Augustus': 8491, 'HiQ': 35, 'GeneMark': 10642, 'GlimmerHMM': 10291, 'snap': 9920}
[08/09/22 21:30:01]: EVM Weights: {'Augustus': 1, 'HiQ': 2, 'GeneMark': 1, 'GlimmerHMM': 1, 'snap': 1, 'proteins': 1}
[08/09/22 21:30:01]: Summary of gene models passed to EVM (weights):
[08/09/22 21:30:01]: Launching EVM via funannotate-runEVM.py
[08/09/22 21:30:01]: /hpc/home/idm7/miniconda3/envs/funannotate/bin/python /hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-runEVM.py -w /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/weights.evm.txt -c 12 -g /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/gene_predictions.gff3 -d /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/EVM -f /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa -l annotated_2022-08-09/logfiles/funannotate-EVM.log -m 10 -i 1500 -o /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/evm.round1.gff3 --EVM_HOME /hpc/home/idm7/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1 -p /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/protein_alignments.gff3 --no-progress
[08/09/22 21:41:15]: 10,222 total gene models from EVM
[08/09/22 21:41:15]: Generating protein fasta files from 10,222 EVM models
[08/09/22 21:41:22]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[08/09/22 21:41:22]: diamond blastp --sensitive --query annotated_2022-08-09/predict_misc/evm.round1.proteins.fa --threads 12 --out annotated_2022-08-09/predict_misc/repeats.xml --db /hpc/home/idm7/miniconda3/envs/databases/funannotate_db/repeats.dmnd --evalue 1e-10 --max-target-seqs 1 --outfmt 5
[08/09/22 21:41:28]: bedtools intersect -sorted -f 0.9 -a /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/evm.round1.gff3.sorted.gff -b /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/repeatmasker.bed.sorted.bed
[08/09/22 21:41:28]: Found 19 gene models to remove: 0 too short; 0 span gaps; 19 transposable elements
[08/09/22 21:41:28]: 10,203 gene models remaining
[08/09/22 21:41:28]: Predicting tRNAs
[08/09/22 21:41:28]: tRNAscan-SE -o annotated_2022-08-09/predict_misc/tRNAscan.out --thread 12 /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa
[08/09/22 21:43:31]: Status: Phase I: Searching for tRNAs with HMM-enabled Infernal
Status: Phase II: Infernal verification of candidate tRNAs detected with first-pass scan

[08/09/22 21:43:31]: 
tRNAscan-SE v.2.0.9 (July 2021) - scan sequences for transfer RNAs
Copyright (C) 2020 Patricia Chan and Todd Lowe
                   University of California Santa Cruz
Freely distributed under the GNU General Public License (GPLv3)

------------------------------------------------------------
Sequence file(s) to search:        /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa
Search Mode:                       Eukaryotic
Results written to:                annotated_2022-08-09/predict_misc/tRNAscan.out
Output format:                     Tabular
Searching with:                    Infernal First Pass->Infernal
Isotype-specific model scan:       Yes
Covariance model:                  /hpc/home/idm7/miniconda3/envs/funannotate/lib/tRNAscan-SE/models/TRNAinf-euk.cm
                                   /hpc/home/idm7/miniconda3/envs/funannotate/lib/tRNAscan-SE/models/TRNAinf-euk-SeC.cm
Infernal first pass cutoff score:  10

Temporary directory:               /tmp
------------------------------------------------------------


[08/09/22 21:43:32]: bedtools intersect -sorted -v -a /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/trnascan.gff3.sorted.gff3 -b /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/evm.cleaned.gff3.sorted.gff3 /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/assembly-gaps.bed.sorted.gff3
[08/09/22 21:43:32]: 41 tRNAscan models are valid (non-overlapping)
[08/09/22 21:43:32]: Generating GenBank tbl annotation file
[08/09/22 21:43:46]: Collecting final annotation files for 1 total gene models
[08/09/22 21:43:46]: Converting to final Genbank format
[08/09/22 21:43:46]: /hpc/home/idm7/miniconda3/envs/funannotate/bin/python /hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i annotated_2022-08-09/predict_misc/tbl2asn/genome.tbl -f /hpc/group/bio1/ian/eurotiomycetes/nuclear_genomes_final/11164/annotated_2022-08-09/predict_misc/genome.softmasked.fa -o annotated_2022-08-09/predict_misc/tbl2asn --sbt /hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/config/test.sbt -d annotated_2022-08-09/predict_results/Neophaeomoniellasp11164_11164.discrepency.report.txt -s Neophaeomoniellasp11164 -t -l paired-ends -v 1 -c 12 --strain 11164
[08/09/22 21:44:44]: Funannotate predict is finished, output files are in the annotated_2022-08-09/predict_results folder
[08/09/22 21:44:44]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotated_2022-08-09 -c 12

Run antiSMASH (optional): 
funannotate remote -i annotated_2022-08-09 -m antismash -e [email protected]

Annotate Genome: 
funannotate annotate -i annotated_2022-08-09 --cpus 12 --sbt yourSBTfile.txt
-------------------------------------------------------
                
[08/09/22 21:44:44]: Training parameters file saved: annotated_2022-08-09/predict_results/neophaeomoniellasp11164_11164.parameters.json
[08/09/22 21:44:44]: Add species parameters to database:

  funannotate species -s neophaeomoniellasp11164_11164 -a annotated_2022-08-09/predict_results/neophaeomoniellasp11164_11164.parameters.json

OS/Install Information

-------------------------------------------------------
Checking dependencies for 1.8.12
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.77
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.1.0
numpy: 1.23.0
pandas: 1.4.3
psutil: 5.9.1
requests: 2.28.1
scikit-learn: 1.1.1
scipy: 1.8.1
seaborn: 0.11.2
All 11 python packages installed


You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000024
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed


Checking Environmental Variables...
$FUNANNOTATE_DB=/hpc/home/idm7/miniconda3/envs/databases/funannotate_db
$PASAHOME=/hpc/home/idm7/miniconda3/envs/funannotate/opt/pasa-2.5.2
$TRINITY_HOME=/hpc/home/idm7/miniconda3/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/hpc/home/idm7/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/hpc/home/idm7/miniconda3/envs/funannotate/config/
$GENEMARK_PATH=/hpc/group/bio1/ian/envs/funannotate/gmes_linux_64_4
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
	ERROR: signalp found but error running signalp
PASA: 2.5.2
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.0.15
emapper.py: There was an error retrieving eggnog-mapper DB data: not a valid file "/hpc/group/bio1/ian/envs/funannotate/lib/python3.8/site-packages/data/eggnog.db"
Maybe you need to run download_eggnog_data.py
emapper-2.1.9 / Expected eggNOG DB version: 5.0.2 / Installed eggNOG DB version: unknown / Diamond found: diamond 2.0.15 / MMseqs2 found: 13.45111

ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2021-08-25
gmes_petap.pl: 4.69_lic
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.9.1-internal
kallisto: 0.46.1
mafft: v7.505 (2022/Apr/10)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: pigz 2.7
proteinortho: 6.1.0
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 39
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
	ERROR: signalp not installed

Aug 10 '22 03:08 IanDMedeiros

Wondering if --numbering defaults to nothing rather than 1 if it isn't explicitly called, and so all predicted proteins got an identical locus tag and overwrote one another? Testing that idea now...

Edit: No, I guess that wasn't it.

Aug 10 '22 03:08 IanDMedeiros

Ok. I have verified that funannotate predict behaves correctly if I leave off --name and --numbering entirely, but if I use either --name $locus_tag alone or --name $locus_tag --numbering 1, I just get a single sequence repeated thousands of times in the .gbk file.

Aug 10 '22 14:08 IanDMedeiros

Okay thanks. Let me try to replicate and identify the problem. Seems like I would have seen this before unless it's related to some more recent changes.

Aug 10 '22 15:08 nextgenusfs

I can't seem to replicate this behavior using the test dataset:

$ .funannotate-docker test -t predict --cpus 4 --debug
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 4 --species Awesome testicus
#########################################################

[Aug 10 11:21 PM]: OS: Debian GNU/Linux 10, 4 cores, ~ 8 GB RAM. Python: 3.8.13
[Aug 10 11:21 PM]: Running funannotate v1.8.13
...
[Aug 11 02:26 AM]: 1,694 total gene models from EVM
[Aug 11 02:26 AM]: Generating protein fasta files from 1,694 EVM models
[Aug 11 02:26 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Aug 11 02:27 AM]: Found 137 gene models to remove: 0 too short; 0 span gaps; 137 transposable elements
[Aug 11 02:27 AM]: 1,557 gene models remaining
[Aug 11 02:27 AM]: Predicting tRNAs
[Aug 11 02:28 AM]: 112 tRNAscan models are valid (non-overlapping)
[Aug 11 02:28 AM]: Generating GenBank tbl annotation file
[Aug 11 02:29 AM]: Collecting final annotation files for 1,669 total gene models
[Aug 11 02:29 AM]: Converting to final Genbank format
...

And we can look at the tbl file to show the first few genes locus tag and numeration

$ head -n 20 annotate/predict_results/Awesome_testicus.tbl
>Feature CP022970.1
1	577664	REFERENCE
			CFMR	12345
6359	7510	gene
			locus_tag	FUN_000001
6359	7510	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_000001-T1_mrna
			protein_id	gnl|ncbi|FUN_000001-T1
6359	7510	CDS
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_000001-T1_mrna
			protein_id	gnl|ncbi|FUN_000001-T1
8459	8136	gene
			locus_tag	FUN_000002
8459	8136	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|FUN_000002-T1_mrna
			protein_id	gnl|ncbi|FUN_000002-T1

And then re-run with --name parameter set

$ funannotate-docker predict -i test.softmasked.fa \
    --protein_evidence protein.evidence.fasta -o annotate2 \
    --name NEW1 --augustus_species saccharomyces --cpus 4 \
    --species "Awesome testicus"
-------------------------------------------------------
[Aug 11 07:27 AM]: OS: Debian GNU/Linux 10, 4 cores, ~ 8 GB RAM. Python: 3.8.13
[Aug 11 07:27 AM]: Running funannotate v1.8.13
...
[Aug 11 10:34 AM]: 1,685 total gene models from EVM
[Aug 11 10:34 AM]: Generating protein fasta files from 1,685 EVM models
[Aug 11 10:34 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Aug 11 10:34 AM]: Found 133 gene models to remove: 0 too short; 0 span gaps; 133 transposable elements
[Aug 11 10:34 AM]: 1,552 gene models remaining
[Aug 11 10:34 AM]: Predicting tRNAs
[Aug 11 10:36 AM]: 112 tRNAscan models are valid (non-overlapping)
[Aug 11 10:36 AM]: Generating GenBank tbl annotation file
[Aug 11 10:36 AM]: Collecting final annotation files for 1,664 total gene models
[Aug 11 10:36 AM]: Converting to final Genbank format
[Aug 11 10:37 AM]: Funannotate predict is finished, output files are in the annotate2/predict_results folder
...

If we look at the tbl file:

$ head -n 20 annotate2/predict_results/Awesome_testicus.tbl
>Feature CP022970.1
1	577664	REFERENCE
			CFMR	12345
6359	7510	gene
			locus_tag	NEW1_000001
6359	7510	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|NEW1_000001-T1_mrna
			protein_id	gnl|ncbi|NEW1_000001-T1
6359	7510	CDS
			codon_start	1
			product	hypothetical protein
			transcript_id	gnl|ncbi|NEW1_000001-T1_mrna
			protein_id	gnl|ncbi|NEW1_000001-T1
8459	8136	gene
			locus_tag	NEW1_000002
8459	8136	mRNA
			product	hypothetical protein
			transcript_id	gnl|ncbi|NEW1_000002-T1_mrna
			protein_id	gnl|ncbi|NEW1_000002-T1

Aug 11 '22 18:08 nextgenusfs

Weird. I am working around it now but I will revisit at some point and try to figure out what is going wrong on my end.

Aug 11 '22 19:08 IanDMedeiros

This turned out to be an error in how I was providing the locus tag prefix to funannotate; because of a poorly constructed table file, there was a hidden newline character in my variable $locus_tag that was messing everything else up. Closing since fixing the table file solved the problem.

Aug 25 '22 21:08 IanDMedeiros

funannotate funannotate copied to clipboard

Predict log suggests that it ran ok, but there is only a single CDS repeated ad nauseum in .gbk

funannotate
funannotate copied to clipboard