funannotate Error in funannotate predict pipeline

Are you using the latest release? I am using 1.8.10 version through docker nextgenusfs/funannotate container Describe the bug I am running funannotate predict on a de novo genome assembly of carnivora. It doesn't create me the augustus.training.proteins.fa file, so It blocks when it arrives at the following point: CMD ERROR: diamond blastp --query augustus.training.proteins.fa --db aug_training.dmnd --more-sensitive -o aug.blast.txt -f 6 qseqid sseqid pident --query-cover 80 --subject-cover 80 --id 80 --no-self-hits

What command did you issue? funannotate predict -i softmasked.genome.fasta --transcript_evidence stringtie_transcript.fa --rna_bam RNA_reads_sorted.bam --pasa_gff sample_mydb_pasa.sqlite.pasa_assemblies.gff3 --out funannotate_tentativo2 --cpus 32 --max_intronlen 20000 --tmpdir ./ --organism other -s human

RNA_reads_sorted,bam -> mapping done with HISAT2 of RNA reads vs softmasked.genome.fasta sample_mydb_pasa.sqlite.pasa_assemblies.gff3 -> results of GenomeThreader, followed by PASA

Logfiles [Mar 06 09:27 PM]: OS: Debian GNU/Linux 10, 128 cores, ~ 528 GB RAM. Python: 3.8.12 [Mar 06 09:27 PM]: Running funannotate v1.8.10 [Mar 06 09:27 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Mar 06 09:27 PM]: Skipping CodingQuarry as --organism=other. Pass a weight larger than 0 to run CQ, ie --weights codingquarry:1 [Mar 06 09:27 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained glimmerhmm pasa snap pasa [Mar 06 09:45 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Mar 06 09:48 PM]: Genome loaded: 371 scaffolds; 2,280,871,531 bp; 33.01% repeats masked [Mar 06 09:48 PM]: Aligning transcript evidence to genome with minimap2 [Mar 06 09:52 PM]: Found 65,957 alignments, wrote GFF3 and Augustus hints to file [Mar 06 09:52 PM]: Extracting hints from RNA-seq BAM file using bam2hints [Mar 06 10:09 PM]: Mapping 554,221 proteins to genome using diamond and exonerate [Mar 06 10:35 PM]: Found 265,635 preliminary alignments with diamond in 0:22:50 --> generated FASTA files for exonerate in 0:03:14 [Mar 06 11:09 PM]: Exonerate finished in 0:32:55: found 17,887 alignments [Mar 06 11:10 PM]: Filtering PASA data for suitable training set [Mar 06 11:11 PM]: CMD ERROR: diamond blastp --query augustus.training.proteins.fa --db aug_training.dmnd --more-sensitive -o aug.blast.txt -f 6 qseqid sseqid pident --query-cover 80 --s ubject-cover 80 --id 80 --no-self-hits

OS/Install Information

output of funannotate check --show-versions Checking dependencies for 1.8.10

You are running Python v 3.8.12. Now checking python packages... biopython: 1.77 goatools: 1.1.12 matplotlib: 3.5.1 natsort: 8.1.0 numpy: 1.22.2 pandas: 1.4.1 psutil: 5.9.0 requests: 2.27.1 scikit-learn: 1.0.2 scipy: 1.5.3 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.42 DBD::SQLite: 1.70 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.05 LWP::UserAgent: 6.61 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.10 YAML: 1.30 local::lib: 2.000028 threads: 2.25 threads::shared: 1.61 ERROR: Bio::Perl not installed, install with cpanm Bio::Perl Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.14 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.15 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

Mar 07 '22 09:03 marco91sol

PS. When I launch funannotate test -t predict --cpus 12 (for testing). It stops at the following point:

ERROR: trainGlimmerHMM /prova_funannotate/test-predict_893adef5-6641-4a50-a392-ba7d66d8d50a/annotate/predict_misc/genome.softmasked.fa /prova_funannotate/test-p redict_893adef5-6641-4a50-a392-ba7d66d8d50a/annotate/predict_misc/glimmer.exons -d annotate/predict_misc/glimmerhmm

Mar 07 '22 13:03 marco91sol

Sorry for late response here....

But I think the script failed to parse the PASA assemblies GFF properly -- I think this PASA file uses the transcript as reference not the genome. Its expecting the output of PASA transdecoder -- ie the reference should be the soft masked genome.

Per the failed test -- what happens if you run that command manually? Is there a perl dependency missing?

Apr 18 '22 00:04 nextgenusfs

funannotate funannotate copied to clipboard

Error in funannotate predict pipeline

funannotate
funannotate copied to clipboard