funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

adding custom annotations: ncRNA from Infernal/Rfam

Open xvazquezc opened this issue 2 years ago • 6 comments

Hi there,

I'm annotating a few genomes and I realised that funannotate does not have any integrated tool for annotating ncRNA other then tRNAscan-SE. I will be annotating ncRNA through Infernal, as per the standard ncRNA annotation with Rfam.

I've only seen a brief reference to adding other annotation on #481, where it is referred to manual manipulation of some .tbl files (which are not necessarily a nice format). I've seen that funannotate annotate has the --annotations option but it's not clearly documented what'd be the format or if it'd be valid for genes not yet called.

Any details that anyone could provide would be appreciated.

Cheers,

xvazquezc avatar Apr 13 '22 08:04 xvazquezc

Are you wanting these as genes in genbank predicted files - infernal predictions can be pretty slow and am not how universally generic a prediction pipeline can be for these as it will depend on the db. Are you wanting to do this for an animal, plant, or fungus?

hyphaltip avatar Apr 16 '22 01:04 hyphaltip

But I think you can import more gene features as a gff file you provide I think but not sure about how it works for noncoding gene features.

hyphaltip avatar Apr 16 '22 01:04 hyphaltip

If you can provide some general GFF files from tools to annotate these I can look at what it would take to have funannotate pass them through.

nextgenusfs avatar Apr 16 '22 01:04 nextgenusfs

I'm running Infernal on fungal genomes, mostly following the Rfam recommendations, i.e. using -Z to calculate e-values, only using CM (--nohmmonly), and using the gathering threshold (--cut_ga, curated thresholds). The only parameter I change is using the more strict --default instead of --rfam at a cost of running time. e.g.

cmscan -Z ${ZVAL} --cpu $NCPUS \
--cut_ga --default --nohmmonly --fmt 2 \
--tblout rfam.tblout \
--clanin ${RFAMDB}/Rfam.clanin \
${RFAMDB}/Rfam.cm $GENOME > rfam.cmscan

So far, the only clear "errors" occur with rRNA genes which can be called as archaeal instead of eukaryotic or bacterial (mito) or the other way around. Nonetheless, rRNA-specific tools are recommended and e.g. barrnap generates gff3 files by default (that could be another nice addition btw)

Regarding Infernal's output, there is no gff option, but the way is run as recommended generates a tabular format output file (fixed width table). Here there is an example.

xvazquezc avatar Apr 19 '22 04:04 xvazquezc

@nextgenusfs I can give this a try.

hyphaltip avatar Apr 20 '22 22:04 hyphaltip

For conversion to gff, see jiffy-infernal-hmmer-scripts, in particular infernal-tblout2gff.pl

alephreish avatar May 15 '23 07:05 alephreish