amr icon indicating copy to clipboard operation
amr copied to clipboard

Nucleotide information for all point mutations in the database

Open kwonsoobin opened this issue 2 years ago • 5 comments

Thanks for the thorough Wiki page on database files.

For the point mutations listed in the Reference Gene Catalog, is there a way to get their position and derived allele at the nucleotide level? AMR_DNA-*.tab files do not seem to include all point mutations listed in ReferenceGeneCatalog.txt

kwonsoobin avatar May 05 '22 17:05 kwonsoobin

Hi,

You're right the AMR_DNA-*.tab files don't contain all the point mutations, only those found on DNA sequences (e.g., 16S or promoter mutations). The other mutations are screened using protein sequence alignment (blastp or blastx). Amino-acid mutations are listed in AMRProt-mutation.tab. See https://github.com/ncbi/amr/wiki/AMRFinderPlus-database#amrprot-mutationtab for details on the format of that file.

Unfortunately the coordinates in that file are the amino-acid position coordinates we use to determine position in the alignments, but multiplying by 3 should get you to the nucleotide position in the reference nucleotide.

Thanks for your interest and please let us know if there is something missing, wrong, or that could be improved in the documentation. It's hard to keep everything up-to-date, and sometimes we miss things.

evolarjun avatar May 05 '22 17:05 evolarjun

Thanks for your quick response! It would be helpful to have the nucleotide-level information (e.g. coordinates, reference allele, derived allele) on all point mutations, but I understand it's not high priority.

kwonsoobin avatar May 05 '22 19:05 kwonsoobin

It would be helpful to have the nucleotide-level information (e.g. coordinates, reference allele, derived allele) on all point mutations

This may be not possible for amino acid point mutations because there may be unknown synonymous nucleotide point mutations in addition to the known amino acid non-synonymous mutation.

vbrover avatar May 05 '22 22:05 vbrover

@kwonsoobin I just happened to read this again, and I thought I'd mention that if you're interested in nucleotide sequences behind the point mutations, you can download the nucleotide sequences derived from running AMRFinderPlus on about a million of the isolate genomes in the NCBI Pathogen Detection pipeline using MicroBIGG-E. More information on MicroBIGG-E and how to download the sequences are available in the documentation.

evolarjun avatar Sep 26 '22 15:09 evolarjun

Thanks for the suggestion, @evolarjun. Will give that a try.

kwonsoobin avatar Oct 12 '22 22:10 kwonsoobin