prokka
prokka copied to clipboard
Improve lists of Rfam domain-specific models
Hello,
I noticed that prokka contains the lists of domain-specific Rfam covariance models, which is very useful and helps focus just on the relevant RNAs. However, the SQL queries used to generate the lists are likely to create false positives.
For example, the bacterial list should not contain Metazoan signal recognition particle RNA
, Small nucleolar RNA U3
, Small nucleolar RNA SNORD36
because these families do not occur in bacteria.
The Rfam team maintains rfam-taxonomy, a dedicated repo with lists of Rfam models for Bacteria, Eukaryotes, Viruses, and other domains. The lists are updated every Rfam release and an archive is available.
Would it be possible to use the Rfam lists in prokka?
Please let me know if you have any questions. I am also tagging @blakesweeney who will soon be replacing me as Rfam Project Leader. Many thanks in advance for looking into this!