prokka
prokka copied to clipboard
Keep original contig name
Is there a way to make the locus tag be the original contig name like RAST does?
If you only have one contig then you can set the --locustag
parameter to be the contig name yourself (and don't use --compliant
which renames contigs).
If you have lots of contigs, then I could see how that might be a useful feature. However many contig names are not legal locus_tag
s, for example SPAdes and Velvet.
I'll leave this open as a possible enhancement.
This enhancement would be instrumental as well to analyse scaffolds from a fragmented metagenome and parse the resulting annotations. I understand that prokka is not designed for annotating genetic material from metagenomes, but in fact is the most versatile tool to achieve this if you subset to superkingdoms first.
Hi @tseemann It would much better if we can know which contig does a gene come from according to its sequence header in fasta format or 'locus_tag' in Genbank format. Also I'm keen to know the location of a gene in the original contig. So I would suggest keeping the original contig ids (or part of them) in the ORF identifier and increment them from every first gene in each contig. Specifically, It would be like this ideally:
>contig1_1 hypothetical protein
ATCG....
>contig1_2 hypothetical protein
ATCG....
>contig1_3 hypothetical protein
ATCG....
>contig2_1 hypothetical protein
ATCG....
This would be quite helpful when people are trying to understand the location of a gene in the genome. Thanks!
@SilentGene given your github username i would have thought you would want to keep your gene source private! ;-)
I think the best way to solve this would be to have a customisable --locustag PATTERN
option, where PATTERN
could have codes in it like --locustag "{{contig}}_{{ftype}}_{{genenum}}"
etc which would give things like contig001_CDS_123
and contig245_rRNA_3
for example.
Haha;-)😆 That would be awesome if we could customize the locustag by patterns like that. Can't wait to try out the new feature!
Hi @tseemann, is a cutomisable --locutag
an option planned for future versions of Prokka? Thanks!
This feature would be very important!
Hi @tseemann! Any updates on this? It seems that more and more people are using prokka for annotating metagenomes and it is indeed important to know in which contig the genes are found. :)
I also encourage you to implement this feature. And thanks for the amazing work!