prokka icon indicating copy to clipboard operation
prokka copied to clipboard

Improve lists of Rfam domain-specific models

Open AntonPetrov opened this issue 2 years ago • 0 comments

Hello,

I noticed that prokka contains the lists of domain-specific Rfam covariance models, which is very useful and helps focus just on the relevant RNAs. However, the SQL queries used to generate the lists are likely to create false positives.

For example, the bacterial list should not contain Metazoan signal recognition particle RNA, Small nucleolar RNA U3, Small nucleolar RNA SNORD36 because these families do not occur in bacteria.

The Rfam team maintains rfam-taxonomy, a dedicated repo with lists of Rfam models for Bacteria, Eukaryotes, Viruses, and other domains. The lists are updated every Rfam release and an archive is available.

Would it be possible to use the Rfam lists in prokka?

Please let me know if you have any questions. I am also tagging @blakesweeney who will soon be replacing me as Rfam Project Leader. Many thanks in advance for looking into this!

AntonPetrov avatar Mar 11 '22 10:03 AntonPetrov