prokka icon indicating copy to clipboard operation
prokka copied to clipboard

CDS labels do not match

Open hayleyjaywilson opened this issue 2 years ago • 2 comments

I have carried out the following annotation run on ~1000 isolates: for i in more list; do echo ${i}; prokka ${i} --proteins fm204883.genbank --locustag SEQ --outdir ${i}_prokka_results; done

This has annotated the genes fine however I have an issue with CDS's. Say in my ref genome (fm20488) the CDS is named SEQ0024. This label does not then carry over to the annotated isolates. SEQ0024 CDS in a different genome is not the same as SEQ0024 in my reference. Have I missed a step? I need to compare various CDS among lots of genomes but this can't happen if different bits are labelled differently. Is there a way to achieve this please?

hayleyjaywilson avatar Mar 17 '22 13:03 hayleyjaywilson

Locus tags are numbered incrementally in the genome, so SEQ0024 as you have it will always be the 24th gene in a given sample, which is usually not the same across samples. In any case, you want to be looking at the gene name field instead of the locus tag field when doing your comparison. But locus tag prefixes should also be made unique to each sample to avoid confusion.

0xaf1f avatar Mar 22 '22 08:03 0xaf1f

Thanks that makes sense now.

hayleyjaywilson avatar Mar 22 '22 09:03 hayleyjaywilson