funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

Funannotate predict fails at stage CodingQuarry (only for some genomes)

Open yvetboele opened this issue 1 year ago • 5 comments

Good morning,

When running funannotate predict (1.8.13) I get an error at the CodingQuarry stage. The exact same commands/pipeline worked for two isolates, and fail for two others, so it is not an installation issue.

Used command: funannotate predict -i /home/yvet/annotations/P93/preprocessing/P93_cleaned_sorted_masked.fasta -o P93_annotation -s "Pseudocercospora eumusae" --isolate P93 --cpus 8 --protein_evidence /home/yvet/annotations/evidence_fastqdump/CIRAD86_proteins/GCF_000340215.1_Mycfi2_protein.fasta $FUNANNOTATE_DB/uniprot_sprot.fasta

The error looks as follows (see full output below): CMD ERROR: CodingQuarry -p 6 -f /home/yvet/annotations/P92/P92_annotation/predict_misc/genome.softmasked.fa -t /home/yvet/annotations/P92/P92_annotation/predict_misc/stringtie.gff3

Running CodingQuarry separately (outside of the funannotate pipeline) gives me an error I also can't really work with: "Segmentation fault (core dumped)"

Any idea what might be the reason? I trained using RNAseq of the species itself and closely related species, and as protein evidence I provide proteins from a closely related species and the general uniprot db.

Thanks for your time! Yvet

FULL OUTPUT: funannotate predict -i /home/yvet/annotations/P93/preprocessing/P93_cleaned_sorted_masked.fasta -o P93_annotation -s "Pseudocercospora eumusae" --isolate P93 --cpus 8 --protein_evidence /home/yvet/annotations/evidence_fastqdump/CIRAD86_proteins/GCF_000340215.1_Mycfi2_protein.fasta $FUNANNOTATE_DB/uniprot_sprot.fasta

[May 09 10:04 AM]: OS: Ubuntu 22.04, 32 cores, ~ 264 GB RAM. Python: 3.8.15 [May 09 10:04 AM]: Running funannotate v1.8.13 [May 09 10:04 AM]: Found training files, will re-use these files: --rna_bam P93_annotation/training/funannotate_train.coordSorted.bam --pasa_gff P93_annotation/training/funannotate_train.pasa.gff3 --stringtie P93_annotation/training/funannotate_train.stringtie.gtf --transcript_alignments P93_annotation/training/funannotate_train.transcripts.gff3 [May 09 10:04 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pasa codingquarry rna-bam genemark selftraining glimmerhmm pasa snap pasa [May 09 10:04 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [May 09 10:05 AM]: Genome loaded: 28 scaffolds; 46,775,304 bp; 4.30% repeats masked [May 09 10:05 AM]: Parsed 879 transcript alignments from: P93_annotation/training/funannotate_train.transcripts.gff3 [May 09 10:05 AM]: Creating transcript EVM alignments and Augustus transcripts hintsfile [May 09 10:05 AM]: Extracting hints from RNA-seq BAM file using bam2hints [May 09 10:05 AM]: Mapping 569,434 proteins to genome using diamond and exonerate [May 09 10:10 AM]: Found 308,673 preliminary alignments with diamond in 0:03:51 --> generated FASTA files for exonerate in 0:01:06 Progress: 26.19% [B^[[B^[[B^[[B Progress: 1.35% ress: 1.33% [May 09 11:32 AM]: Exonerate finished in 1:21:24: found 8,169 alignments [May 09 11:32 AM]: Running GeneMark-ES on assembly [May 09 12:03 PM]: 13,065 predictions from GeneMark [May 09 12:03 PM]: Filtering PASA data for suitable training set [May 09 12:03 PM]: 313 of 325 models pass training parameters [May 09 12:03 PM]: Training Augustus using PASA gene models [May 09 12:03 PM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 81.8% 47.9% exons 23.6% 20.5% genes 1.6% 1.9% [May 09 12:03 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option. [May 09 12:03 PM]: Running Augustus gene prediction using pseudocercospora_eumusae_p93 parameters [May 09 12:15 PM]: 4,841 predictions from Augustus [May 09 12:15 PM]: Pulling out high quality Augustus predictions [May 09 12:15 PM]: Found 1,582 high quality predictions from Augustus (>90% exon evidence) [May 09 12:15 PM]: Running CodingQuarry prediction using stringtie alignments [May 09 12:15 PM]: CMD ERROR: CodingQuarry -p 8 -f /home/yvet/annotations/P93/P93_annotation/predict_misc/genome.softmasked.fa -t /home/yvet/annotations/P93/P93_annotation/predict_misc/stringtie.gff3

yvetboele avatar May 12 '23 09:05 yvetboele

Seems more like a question for CodingQuarry developers @JamesHane

nextgenusfs avatar May 16 '23 16:05 nextgenusfs

I am also facing the same problem. Can you tell me if the issue has been resolved?

yweii avatar Jun 02 '23 03:06 yweii

No I haven't resolved it

ghost avatar Jun 02 '23 04:06 ghost

Hi, I supervised the project which CodingQuarry was developed in, so I didn't write the code but will try to help. I compiled codingquarry on an up to date system today and also got segmentation faults consistently, then tested a version compiled back in 2016 which is running now without issues. I don't know what the issue is exactly but would guess that compiler/dependency updates may have caused the issue. Strange that it worked for you earlier for some isolates. In the short term if you email me at [email protected] I can send you my CodingQuarry binary and hopefully that will solve your problem.

JamesHane avatar Jun 02 '23 08:06 JamesHane

On a related note: If you are testing codingquarry on the same data as funannotate after a failed funannotate run, you should remove the ParameterFiles directory it created during the initial run, or codingquarry may also fail because the directory already exists.

JamesHane avatar Jul 17 '23 10:07 JamesHane