prokka icon indicating copy to clipboard operation
prokka copied to clipboard

inference using ISFinder fails NCBI submission

Open mjcoynejr opened this issue 3 years ago • 1 comments

I am using prokka (an amazing program, BTW) to annotate bacterial genomes for submission to NCBI. Today, I uploaded the sqn files for 23 genomes, and the all failed. Looking at the error messages provided by NCBI, they failed due to (many instances per genome) of "[SEQ_FEAT.InvalidInferenceValue] Inference qualifier problem - unrecognized database".

This is due to prokka's use of the ISFinder database and the resulting inference annotation, e.g. "inference similar to AA sequence:ISfinder:ISBth167". The problem is that ISFinder is not a recognized database for inference. NCBI's explanation of this error says:

"The value of the inference qualifier is constrained by agreement of the international nucleotide sequence database collaboration. This value does not conform to those constraints. Please see the feature table documentation for more information."

I am running prokka in --compliant mode. For now, I'm just going to try turning the offending inference lines into notes using a regular expression on the five-column tables and re-run them through tbl2asn because I can't find a generic inference line format that won't trigger this error on submission. More info at https://www.ncbi.nlm.nih.gov/genbank/evidence/.

Any ideas?

mjcoynejr avatar Jan 19 '21 05:01 mjcoynejr

A quick update -- turning the ISFinder inference lines into notes (e.g. $tbl_file =~ s/\t\t\tinference\tsimilar to AA sequence:ISfinder:(.+)\n/\t\t\tnote\tsimilar to AA sequence $1 from ISfinder (LMGM, France)\n/g;) seems to have worked, my latest upload is clear of these validation errors...

Unfortunately, there were so many of them I failed to see the much rarer InvalidInferenceValue errors thrown by inference lines associated with the antimicrobial resistance database (AMR) -- this throws the same error because BARRGD is not a recognized database, either.

Fortunately, these inference lines refer to a RefSeq accession (e.g. inference similar to AA sequence:BARRGD:NG_048270.1), so I think the fix is straightforward -- change the BARRGD to RefSeq...

mjcoynejr avatar Jan 19 '21 10:01 mjcoynejr