prokka icon indicating copy to clipboard operation
prokka copied to clipboard

inconsistency in prokka annotation

Open vappiah opened this issue 3 years ago • 3 comments

Hi @tseemann I observed an inconsistency when annotating a genome with prokka.

First I downloaded the M. ulcerans Agy99 genome ( fasta and genebank format).

I performed annotation on the fasta file specifying the genebank as proteins with the command below.
prokka --cpus $threads --kingdom Bacteria --prefix Agy99 --genus Mycobacterium --rfam --species ulcerans --cdsrnaolap --metagenome --proteins Agy99.gb Agy99.fasta

I counted the CDS in the original genebank file and that of the prokka generated genebank. The original had 4160 whiles the prokka file had 8665. Am i missing something in the command ? I was expecting similar values for both but the difference is very large. Please advice. Thanks

vappiah avatar Oct 02 '20 01:10 vappiah

Dear vappiah,

Did you check if this difference between numbers means that your new annotation presents pseudo genes or fragmented genes?

Cheers

felipelira avatar Feb 08 '21 14:02 felipelira

Dear @felipelira They are fragmented genes.

vappiah avatar Feb 09 '21 19:02 vappiah

In this manner, how many genes you get when filtering the fragmented genes? Is it close to the expected 4160 genes? Another question. Why do you use the option "metagenome"?

felipelira avatar Feb 10 '21 13:02 felipelira