prokka Please rename your contigs or use --centre XXX to generate clean contig names

So I just tried prokka on a file that contained one sequence as such:

>contig00001 length=455937 numreads=17237
AACTAACAACTAACAACCAACAACAAACCACTAACACATTTGTCTTTCTACAGCCGCTGG
ATCTTTCCCTATTTGATGGATATTGCGATGCGAGATTCGTTGTTCACGCGTCATCGGGTC
GGATTGCTGTCTGCTGTGAGGGGGGATGTGTTGGAAATCGGAGTCGGTACCGGATTGAAT
TTGAAGCATTATCCTGAGCAGACGACGCGGCTCAATGTAGTGGATTCCAATCCTGGAATG
AACGTGCTTCTGCGTCGCCGCATGAAGGGTATTCCATTTCCTGTGCAGCATGCCACAATC

The command:

~/programs/prokka111/bin/prokka --outdir output --cpus 4 --locustag prokka --compliant --usegenus --metagenome --addgene --quiet --force --centre XXX contig.fasta

And it complains with the error:

[02:13:14] Please rename your contigs or use --centre XXX to generate clean contig names.

How can this be bothersome for prokka ? It's just a header text, if it doesn't like it, it can just not read it. Or rename it by itself in memory. This is the kind of thing that just makes a product un-userfrienly. I don't really feel like renaming my hundred of thousands of contigs...

If I add, as suggested, --centre XXX to the command, I still get the same error.

Aug 12 '15 00:08 xapple

I believe the limit comes from how Prokka tries to use the contig names in the GenBank output (see #32).

Confusingly the names have nothing to do with my input FASTA file (so the error message is misleading). The names seem to be auto-generated by Prokka itselt, e.g. gnl|institute|locus_contig000001 where I can set the center using --centre institute and locus tag using --locustag locus but seem to have have no control over the long contig000001 part.

Workaround: --centre C --locustag L gives names like gnl|C|L_contig000001 which are short enough (testing with Prokka 1.11).

Shortening contig000001 to c00001 might help, but I think the real fix is to adjust what Prokka uses as the contig identifiers in the GenBank file - why not just use contig000001 in the GenBank LOCUS line?

Sep 22 '15 11:09 peterjc

I agree only using contig000001 or even shorter c000001 would be a workaround for this issue. Actually the whole problem lies with NCBI's tbl2asn which is very strict with its GenBank IDs (as discussed here #76 and other issues at length).

@tseemann might also replace tbl2asn in the future, see issue #113. I guess it depends if users still like SQN files for NCBI submissions.

Sep 22 '15 11:09 aleimba

Yes, in a sense this (#135) and #76 are duplicates - this bug report has the error in the bug title so was easier to find.

For now I have reverted to Prokka 1.10

Sep 22 '15 13:09 peterjc

This may be a closed thread, but I had the same problem using contigs that were assembled by velvet (long contig names by default, for example: >NODE_1_length_35596_cov_60.583466)

I used sed to remove everything after the node number using: sed -re 's/(_length)[^=]*$/\1/' ${n}.fasta

where ${n} is of course your filename

That resolved the issue with contig names and prokka ran.

Feb 17 '16 02:02 haslamdb

@haslamdb sed -re 's/(_length)[^=]*$/\1/' $ ~/velvet/454_roche_13 this command worked. However, I can't find the output file, which was converted name successfully. How can I find converted file

Oct 10 '19 03:10 nhungdoan1905

@nhungdoan1905 this issue is from 2.5 years ago. Prokka 1.14 should be better?

Oct 10 '19 04:10 tseemann

@nhungdoan1905 this issue is from 2.5 years ago. Prokka 1.14 should be better?

No, I still got the same problem... sigh

Jan 17 '20 06:01 lynxieMummy

Hi, I ran prokka 1.14 for several contigs (all labelled in a similar manner). But I got output as all files (like .fna, .log, .ffn.... )for some but for the other just got a .fna and log file (The log file states a warning : Please rename your contigs OR try '--centre X --compliant' to generate clean contig names).

Can anyone point out towards the mistake? Any help will be appreciated. Thank you

Jul 14 '20 08:07 dhrpa

@nhungdoan1905 If you haven't solved this yet, the output was only showed on your screen. You have to save it as a file like this sed -re 's/(_length)[^=]*$/\1/' $ ~/velvet/454_roche_13 > my_output.fasta

Jul 27 '20 18:07 Sebawe

prokka prokka copied to clipboard

Please rename your contigs or use --centre XXX to generate clean contig names

prokka
prokka copied to clipboard