CodonOptimize replaces a codon with non-synonymous codon
Optimizing a gene coding sequence with a custom codon table using the DnaChisel version 3.2.11 gives unexpected output.
This is the snippet from the Python code used:
problem = DnaOptimizationProblem(
sequence=str(sequence.seq).upper(),
objectives=[CodonOptimize(
method="use_best_codon",
codon_usage_table=codon_usage_dict
)]
)
problem.resolve_constraints()
problem.optimize_with_report(target=f"report_{index}.zip")
This is the dict of a custom codon table:
{'M': {'ATG': 1.0}, 'I': {'ATT': 0.566, 'ATC': 0.1841, 'ATA': 0.2499}, 'A': {'GCA': 0.3201, 'GCT': 0.4355, 'GCC': 0.1597, 'GCG': 0.0848}, 'S': {'TCA': 0.3316, 'TCT': 0.2708, 'AGT': 0.1814, 'AGC': 0.1026, 'TCC': 0.0569, 'TCG': 0.0566}, 'Q': {'CAG': 0.2336, 'CAA': 0.7664}, 'D': {'GAT': 0.6968, 'GAC': 0.3032}, 'V': {'GTC': 0.1422, 'GTT': 0.452, 'GTA': 0.249, 'GTG': 0.1569}, 'C': {'TGC': 0.3024, 'TGT': 0.6976}, 'Y': {'TAT': 0.6334, 'TAC': 0.3666}, 'T': {'ACT': 0.4045, 'ACA': 0.3398, 'ACC': 0.1949, 'ACG': 0.0607}, 'L': {'TTG': 0.1893, 'TTA': 0.3168, 'CTC': 0.0684, 'CTT': 0.2769, 'CTG': 0.0578, 'CTA': 0.0908}, 'N': {'AAT': 0.7284, 'AAC': 0.2716}, 'F': {'TTT': 0.8298, 'TTC': 0.1702}, 'E': {'GAA': 0.7289, 'GAG': 0.2711}, 'K': {'AAA': 0.7403, 'AAG': 0.2597}, 'G': {'GGT': 0.3882, 'GGC': 0.2088, 'GGG': 0.059, 'GGA': 0.344}, 'P': {'CCA': 0.4706, 'CCT': 0.3382, 'CCG': 0.0914, 'CCC': 0.0997}, 'R': {'CGC': 0.0747, 'CGA': 0.2287, 'CGT': 0.1656, 'AGA': 0.4013, 'CGG': 0.038, 'AGG': 0.0916}, 'H': {'CAT': 0.7248, 'CAC': 0.2752}, 'W': {'TGG': 1.0}, '*': {'TAA': 0.5642, 'TAG': 0.1453, 'TGA': 0.2906}}
This is the input sequence:
ATGAACAATACGATTGAAACCATTCTTGCTCATCGCTCTATCCGAAAATTCACCGCAGTTCCTATTACTGATGAACAAAGACAAACCATCATTCAAGCAGGTTTAGCTGCGTCTTCTTCTAGTATGCTTCAAGTCGTCTCAATCGTTCGAGTGACTGACTCTGAAAAGCGTAACGAATTGGCTCAATTTGCTGGTAACCAAGCTTATGTTGAAAGTGCGGCTGAGTTCTTAGTGTTTTGTATTGATTATCAGCGCCATGCAACCATCAATCCTGATGTACAGGCAGACTTTACAGAACTAACTCTGATTGGAGCAGTAGATTCTGGAATCATGGCACAAAACTGCTTGCTTGCAGCCGAGTCTATGGGATTAGGTGGCGTATATATTGGAGGACTAAGGAATAGCGCAGCTCAAGTTGATGAGCTATTGGGCTTACCGGAAAATAGCGCGGTGTTGTTTGGTATGTGCTTAGGGCATCCCGATCAAAATCCCGAAGTAAAGCCACGCCTACCTGCACATGTGGTTGTTCATGAAAATCAATACCAAGAGCTAAATTTAGATGATATTCAGAGCTACGATCAAACTATGCAAGCGTATTATGCGAGCCGTACAAGCAATCAAAAACTGAGTACATGGTCGCAAGAAGTCACTGGGAAGCTTGCTGGTGAGTCGCGACCTCATATTCTGCCGTACTTGAACAGTAAGGGGCTAGCAAAACGCTAA
This is the output sequence:
ATGAAAAATACTATTGAAACTATTCCAGCTCATCCATCAATGCCAAAATTAACTGCTGTTCCAATTACTGATGAACAAAGACAAACTATGATTCAAGCTGGTTTAGCTGCTTCATCATCAAGAATGCCACAAGTTGTTTCAATGGTTCCAGTTACTGAATCAGAAAATCCAAAAGAATTTGCTCAATTTGCTGGTAAACAAGCTTATGTTGAAAGAGCTGCTGATTTATTAGTTTTTTGTATTGATTATCATCCACATGCTACTATGAATCCAGATGTTCATGCTGAATTTACTGAACCAACTCCAATTGGTGCTGTTGATTCAGGTATGATGGCTCAAAAATGGTTTCCAGCTGCTGATTCAATGGGTTTAGGTGGTGTTTATATTGGTGGTCCAAGAAATAGAGCTGCTCAAGTTGATGATCCATTTGGTTTACCAGAAAATAGAGCTGTTTTTTTTGGTATGTGGTTAGGTCATCCAGATCAAAATCCAGAAGTTAATCCACCACCACCAGCTCATGTTGTTGTTCATGAAAATCAATAACAAGATCCAAATTTAGATGATATTCATAGATAAGATCAAACTATGCAAGCTTATTATGCTAGACCAACTAGAAATCAAAAACCAAGAACTTGGTCACAAGAAGTTACTGGTAATCCAGCTGGTGATTCACCACCACATATTCCACCATAATTTAAAAGAAATGGTCCAGCTAAACCATAA
By observing codons in the output sequence, e.g. the first codon that comes after the start codon, it is replaced by another amino acid coding codon AAC -> AAA instead of it's synonym. From the given codon table AAC should have been replaced by AAT according to this: {..., 'N': {'AAT': 0.7284, 'AAC': 0.2716}, ...}.
Either this is a bug or I am missing something.
This is not a bug but this is an important caveat in DnaChisel: the CodonOptimize() objective should always be used with a EnforceTranslation constraint. In your example you should add constraints=[EnforceTranslation()] to your DnaOptimizationProblem. This is because objectives never constrain the sequence, only constraints do. This is shown in the Readme and the examples, and there is a warning in the docs (Warning: always use with an EnforceTranslation constraint) but I would support adding more warnings everywhere.
Thanks for your reply. I was mainly using this link for guidance and the warning is definitely missing here: https://edinburgh-genome-foundry.github.io/DnaChisel/ref/builtin_specifications.html. I think it would be good to add that warning for every appropriate specification description.
I added an improved warning in a more prominent location (in the introductory paragraph), for each relevant class. The Genbank API docs already had a prominent warning. I also added a note to the main readme and to EnforceTranslation.
I'm currently reviewing our package documentation generation practices, once that's done I'll update the html docs.