diamond icon indicating copy to clipboard operation
diamond copied to clipboard

diamond blastx support for IUPAC ambiguity code

Open wiebepo opened this issue 2 years ago • 1 comments

It appears that when diamond blastx encounters a codon that contains a nucleotide IUPAC ambiguity code , "RYWSMKHBVDN", the resulting amino acid is "X"; however, in some cases, codons that contain a nucleotide IUPAC ambiguity code can be translated. For example, it may be preferable for "AAR" to be translated to "K" instead of "X". This could improve the accuracy of the blastx alignment.

I noted that the options for the genetic code can be controlled with the argument "--query-gencode"; however, I did not find an option that accommodates IUPAC ambiguity codes.

Here is a codon to aa table for the standard genetic code for all codons that contain a nucleotide IUPAC ambiguity code and encode for an amino acid, or an IUPAC ambiguity code for an amino acid:

AAR K
RAC B
ACR T
RAT B
AGR R
CAR Q
CCR P
CTR L
CGR R
TRA *
TAR *
TCR S
TTR L
GAR E
GCR A
GTR V
GGR G
AAY N
ACY T
ATY I
AGY S
CAY H
CCY P
CTY L
CGY R
YTA L
TAY Y
TCY S
TTY F
YTG L
TGY C
GAY D
GCY A
GTY V
GGY G
ACW T
ATW I
CCW P
CTW L
CGW R
WTA J
TCW S
GCW A
GTW V
GGW G
SAA Z
ACS T
SAG Z
CCS P
CTS L
CGS R
TCS S
GCS A
GTS V
GGS G
ACM T
ATM I
CCM P
CTM L
CGM R
MTA J
MTC J
TCM S
MTT J
MGA R
GCM A
GTM V
MGG R
GGM G
ACK T
CCK P
CTK L
CGK R
TCK S
GCK A
GTK V
GGK G
ACH T
ATH I
CCH P
CTH L
CGH R
HTA J
TCH S
GCH A
GTH V
GGH G
ACB T
CCB P
CTB L
CGB R
TCB S
GCB A
GTB V
GGB G
ACV T
CCV P
CTV L
CGV R
TCV S
GCV A
GTV V
GGV G
ACD T
CCD P
CTD L
CGD R
TCD S
GCD A
GTD V
GGD G
ACN T
CCN P
CTN L
CGN R
TCN S
GCN A
GTN V
GGN G

Here is aa to codon:

K ['AAR']
B ['RAC', 'RAT']
T ['ACR', 'ACY', 'ACW', 'ACS', 'ACM', 'ACK', 'ACH', 'ACB', 'ACV', 'ACD', 'ACN']
R ['AGR', 'CGR', 'CGY', 'CGW', 'CGS', 'CGM', 'MGA', 'MGG', 'CGK', 'CGH', 'CGB', 'CGV', 'CGD', 'CGN']
Q ['CAR']
P ['CCR', 'CCY', 'CCW', 'CCS', 'CCM', 'CCK', 'CCH', 'CCB', 'CCV', 'CCD', 'CCN']
L ['CTR', 'TTR', 'CTY', 'YTA', 'YTG', 'CTW', 'CTS', 'CTM', 'CTK', 'CTH', 'CTB', 'CTV', 'CTD', 'CTN']
* ['TRA', 'TAR']
S ['TCR', 'AGY', 'TCY', 'TCW', 'TCS', 'TCM', 'TCK', 'TCH', 'TCB', 'TCV', 'TCD', 'TCN']
E ['GAR']
A ['GCR', 'GCY', 'GCW', 'GCS', 'GCM', 'GCK', 'GCH', 'GCB', 'GCV', 'GCD', 'GCN']
V ['GTR', 'GTY', 'GTW', 'GTS', 'GTM', 'GTK', 'GTH', 'GTB', 'GTV', 'GTD', 'GTN']
G ['GGR', 'GGY', 'GGW', 'GGS', 'GGM', 'GGK', 'GGH', 'GGB', 'GGV', 'GGD', 'GGN']
N ['AAY']
I ['ATY', 'ATW', 'ATM', 'ATH']
H ['CAY']
Y ['TAY']
F ['TTY']
C ['TGY']
D ['GAY']
J ['WTA', 'MTA', 'MTC', 'MTT', 'HTA']
Z ['SAA', 'SAG']

wiebepo avatar Nov 16 '23 18:11 wiebepo

Makes sense, I'll look into implementing it.

bbuchfink avatar Nov 21 '23 15:11 bbuchfink