diamond blastx support for IUPAC ambiguity code
It appears that when diamond blastx encounters a codon that contains a nucleotide IUPAC ambiguity code , "RYWSMKHBVDN", the resulting amino acid is "X"; however, in some cases, codons that contain a nucleotide IUPAC ambiguity code can be translated. For example, it may be preferable for "AAR" to be translated to "K" instead of "X". This could improve the accuracy of the blastx alignment.
I noted that the options for the genetic code can be controlled with the argument "--query-gencode"; however, I did not find an option that accommodates IUPAC ambiguity codes.
Here is a codon to aa table for the standard genetic code for all codons that contain a nucleotide IUPAC ambiguity code and encode for an amino acid, or an IUPAC ambiguity code for an amino acid:
AAR K
RAC B
ACR T
RAT B
AGR R
CAR Q
CCR P
CTR L
CGR R
TRA *
TAR *
TCR S
TTR L
GAR E
GCR A
GTR V
GGR G
AAY N
ACY T
ATY I
AGY S
CAY H
CCY P
CTY L
CGY R
YTA L
TAY Y
TCY S
TTY F
YTG L
TGY C
GAY D
GCY A
GTY V
GGY G
ACW T
ATW I
CCW P
CTW L
CGW R
WTA J
TCW S
GCW A
GTW V
GGW G
SAA Z
ACS T
SAG Z
CCS P
CTS L
CGS R
TCS S
GCS A
GTS V
GGS G
ACM T
ATM I
CCM P
CTM L
CGM R
MTA J
MTC J
TCM S
MTT J
MGA R
GCM A
GTM V
MGG R
GGM G
ACK T
CCK P
CTK L
CGK R
TCK S
GCK A
GTK V
GGK G
ACH T
ATH I
CCH P
CTH L
CGH R
HTA J
TCH S
GCH A
GTH V
GGH G
ACB T
CCB P
CTB L
CGB R
TCB S
GCB A
GTB V
GGB G
ACV T
CCV P
CTV L
CGV R
TCV S
GCV A
GTV V
GGV G
ACD T
CCD P
CTD L
CGD R
TCD S
GCD A
GTD V
GGD G
ACN T
CCN P
CTN L
CGN R
TCN S
GCN A
GTN V
GGN G
Here is aa to codon:
K ['AAR']
B ['RAC', 'RAT']
T ['ACR', 'ACY', 'ACW', 'ACS', 'ACM', 'ACK', 'ACH', 'ACB', 'ACV', 'ACD', 'ACN']
R ['AGR', 'CGR', 'CGY', 'CGW', 'CGS', 'CGM', 'MGA', 'MGG', 'CGK', 'CGH', 'CGB', 'CGV', 'CGD', 'CGN']
Q ['CAR']
P ['CCR', 'CCY', 'CCW', 'CCS', 'CCM', 'CCK', 'CCH', 'CCB', 'CCV', 'CCD', 'CCN']
L ['CTR', 'TTR', 'CTY', 'YTA', 'YTG', 'CTW', 'CTS', 'CTM', 'CTK', 'CTH', 'CTB', 'CTV', 'CTD', 'CTN']
* ['TRA', 'TAR']
S ['TCR', 'AGY', 'TCY', 'TCW', 'TCS', 'TCM', 'TCK', 'TCH', 'TCB', 'TCV', 'TCD', 'TCN']
E ['GAR']
A ['GCR', 'GCY', 'GCW', 'GCS', 'GCM', 'GCK', 'GCH', 'GCB', 'GCV', 'GCD', 'GCN']
V ['GTR', 'GTY', 'GTW', 'GTS', 'GTM', 'GTK', 'GTH', 'GTB', 'GTV', 'GTD', 'GTN']
G ['GGR', 'GGY', 'GGW', 'GGS', 'GGM', 'GGK', 'GGH', 'GGB', 'GGV', 'GGD', 'GGN']
N ['AAY']
I ['ATY', 'ATW', 'ATM', 'ATH']
H ['CAY']
Y ['TAY']
F ['TTY']
C ['TGY']
D ['GAY']
J ['WTA', 'MTA', 'MTC', 'MTT', 'HTA']
Z ['SAA', 'SAG']
Makes sense, I'll look into implementing it.