funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

Run predict with preexisting EVM results

Open BaiweiLo opened this issue 3 years ago • 2 comments

Hi,

Is there a checkpoint in the prediction pipeline after finishing EVM?

I was running funannotate predict, but keeps running into memory issues with funannotate-runEVM.py Eventually I ran the evm steps manually outside the pipeline. I named the EVM results as evm.round1.gff3 and evm.round1.proteins.fa, and placed them under predict_misc. However, when I re-run funannotate predict, funannotate tries to restart EVM all over again.

Thank you for your help!!

BaiweiLo avatar Jul 06 '22 15:07 BaiweiLo

There is a --keep_evm option in predict.

But what version of funannotate? I thought I fixed the memory issue, it happens with large genomes and large contigs with the internal partitioning scheme. The default partitioning scheme in EVM isn't the greatest and can fragment some genes because it arbitrarily chops scaffolds up into pieces.

nextgenusfs avatar Jul 06 '22 15:07 nextgenusfs

Thank you! This really helps. It was version 1.8.11, the latest one I get by mamba. I think the memory issue may have something due to our servers as I cannot reproduce the same error every time with the same command...but it always die midway in funannotate-runEVM.py. I am aware of the drawback of the original partitioning from EVM, so I did the partitioning with your method (which is really amazing btw), just step by step manually.

I have one more question regarding EVM. From my personal experience, when two genes overlap (usually on opposite strands and one sits within the intron of another), EVM tends to break the longer one into two genes, thus predicting three gene models. I wonder if that is still an issue using funannotate?

Many thanks.

BaiweiLo avatar Jul 07 '22 14:07 BaiweiLo