amr
amr copied to clipboard
Nucleotide information for all point mutations in the database
Thanks for the thorough Wiki page on database files.
For the point mutations listed in the Reference Gene Catalog, is there a way to get their position and derived allele at the nucleotide level? AMR_DNA-*.tab
files do not seem to include all point mutations listed in ReferenceGeneCatalog.txt
Hi,
You're right the AMR_DNA-*.tab files don't contain all the point mutations, only those found on DNA sequences (e.g., 16S or promoter mutations). The other mutations are screened using protein sequence alignment (blastp or blastx). Amino-acid mutations are listed in AMRProt-mutation.tab
. See https://github.com/ncbi/amr/wiki/AMRFinderPlus-database#amrprot-mutationtab for details on the format of that file.
Unfortunately the coordinates in that file are the amino-acid position coordinates we use to determine position in the alignments, but multiplying by 3 should get you to the nucleotide position in the reference nucleotide.
Thanks for your interest and please let us know if there is something missing, wrong, or that could be improved in the documentation. It's hard to keep everything up-to-date, and sometimes we miss things.
Thanks for your quick response! It would be helpful to have the nucleotide-level information (e.g. coordinates, reference allele, derived allele) on all point mutations, but I understand it's not high priority.
It would be helpful to have the nucleotide-level information (e.g. coordinates, reference allele, derived allele) on all point mutations
This may be not possible for amino acid point mutations because there may be unknown synonymous nucleotide point mutations in addition to the known amino acid non-synonymous mutation.
@kwonsoobin I just happened to read this again, and I thought I'd mention that if you're interested in nucleotide sequences behind the point mutations, you can download the nucleotide sequences derived from running AMRFinderPlus on about a million of the isolate genomes in the NCBI Pathogen Detection pipeline using MicroBIGG-E. More information on MicroBIGG-E and how to download the sequences are available in the documentation.
Thanks for the suggestion, @evolarjun. Will give that a try.