prokka icon indicating copy to clipboard operation
prokka copied to clipboard

Keep original contig name

Open ajsinghal opened this issue 8 years ago • 9 comments

Is there a way to make the locus tag be the original contig name like RAST does?

ajsinghal avatar Jul 14 '16 14:07 ajsinghal

If you only have one contig then you can set the --locustag parameter to be the contig name yourself (and don't use --compliant which renames contigs).

If you have lots of contigs, then I could see how that might be a useful feature. However many contig names are not legal locus_tags, for example SPAdes and Velvet.

I'll leave this open as a possible enhancement.

tseemann avatar Sep 10 '16 09:09 tseemann

This enhancement would be instrumental as well to analyse scaffolds from a fragmented metagenome and parse the resulting annotations. I understand that prokka is not designed for annotating genetic material from metagenomes, but in fact is the most versatile tool to achieve this if you subset to superkingdoms first.

thierryjanssens avatar Jul 30 '19 05:07 thierryjanssens

Hi @tseemann It would much better if we can know which contig does a gene come from according to its sequence header in fasta format or 'locus_tag' in Genbank format. Also I'm keen to know the location of a gene in the original contig. So I would suggest keeping the original contig ids (or part of them) in the ORF identifier and increment them from every first gene in each contig. Specifically, It would be like this ideally:

>contig1_1 hypothetical protein
ATCG....
>contig1_2 hypothetical protein
ATCG....
>contig1_3 hypothetical protein
ATCG....
>contig2_1 hypothetical protein
ATCG....

This would be quite helpful when people are trying to understand the location of a gene in the genome. Thanks!

SilentGene avatar Aug 20 '19 10:08 SilentGene

@SilentGene given your github username i would have thought you would want to keep your gene source private! ;-)

I think the best way to solve this would be to have a customisable --locustag PATTERN option, where PATTERN could have codes in it like --locustag "{{contig}}_{{ftype}}_{{genenum}}" etc which would give things like contig001_CDS_123 and contig245_rRNA_3 for example.

tseemann avatar Aug 21 '19 02:08 tseemann

Haha;-)😆 That would be awesome if we could customize the locustag by patterns like that. Can't wait to try out the new feature!

SilentGene avatar Aug 21 '19 02:08 SilentGene

Hi @tseemann, is a cutomisable --locutag an option planned for future versions of Prokka? Thanks!

cfrioux avatar Jul 03 '20 05:07 cfrioux

This feature would be very important!

mkazanov avatar Jul 16 '20 01:07 mkazanov

Hi @tseemann! Any updates on this? It seems that more and more people are using prokka for annotating metagenomes and it is indeed important to know in which contig the genes are found. :)

agavriilidou avatar Dec 07 '20 21:12 agavriilidou

I also encourage you to implement this feature. And thanks for the amazing work!

JuanmaMedina avatar Jul 23 '21 07:07 JuanmaMedina