prokka icon indicating copy to clipboard operation
prokka copied to clipboard

problem with fasta header starting with '>0 '

Open gtonkinhill opened this issue 3 years ago • 0 comments

Hi,

I've searched through the issues so hopefully this hasn't been mentioned before. It seems that Prokka run's into problems when a fasta header starts with >0 .

In this case it renames the sequence header as SEQ in the annotations of the output gff file but does not rename the sequence in the fasta section of the gff file. This can lead to downstream programs skipping these annotations.

I've copied an example below and attached the corresponding input fasta file along with the output gff file.

##gff-version 3
##sequence-region 0 1 526811
##sequence-region 1 1 500965
SEQ     Prodigal:002006 CDS     25      351     .       +       0       ID=AAJEMFJL_00001;inference=ab initio prediction:Prodigal:002006;locus_tag=AAJEMFJL_00001;product=unannotated protein
SEQ     Prodigal:002006 CDS     409     747     .       +       0       ID=AAJEMFJL_00002;inference=ab initio prediction:Prodigal:002006;locus_tag=AAJEMFJL_00002;product=unannotated protein
SEQ     Prodigal:002006 CDS     753     2168    .       -       0       ID=AAJEMFJL_00003;inference=ab initio prediction:Prodigal:002006;locus_tag=AAJEMFJL_00003;product=unannotated protein
.
.
.
.
##FASTA
>0
TACAACCTGCTGTTGGTGTCGCGTATGAAAGAAGAGCTGGGTGCCGGTATCAATACGGGC
ATCATTCGAGCGATGGGTGGGACCGGCAAAGTGGTCACCTCGGCGGGTCTGGTCTTCGCG

I'm using Prokka v1.14.6 and ran the command prokka --noanno 11861_7#10.fa

test.zip

gtonkinhill avatar May 12 '21 06:05 gtonkinhill