stitchr icon indicating copy to clipboard operation
stitchr copied to clipboard

Improve % of perfectly replicated CDR3s when using a NT CDR3

Open JamieHeather opened this issue 2 years ago • 0 comments

Currently when submitting a rearrangements using nucleotide CDR3s, Stitchr will produce a small proportion of TCRs where the stitched nucleotide sequence does not exactly replicate the sequence that appeared in the original TCR (see Fig 3D of the NAR paper. This is due to Stitchr determining the edge of the germline-encodable sequence at the codon level, but V(D)J recombination will often delete only part of a codon and then P/NP/D gene nucleotide additions can complete a different but redundant codon encoding the same amino acid (see Fig 1C of the paper. Of course the final stitched amino acid may be the same, but it would be nice for these nucleotide-provided CDR3s to perfectly match the actual rearranged TCR, without needing to resort to the slower seamless option.

It's just occurred to me, but the vast majority of these mismatches could probably be avoided by with a simple switch. Currently the NT-CDR3 option copies the AA-CDR3 option, defining the wholly-germline encodable and then filling in with the non-templated sequence. Instead, when using NT-CDR3 the script could delete one amino acid further back from the edge of what is potentially encodable, and simply add back in an extra 3 NT from the provided CDR3 sequence.

It still wouldn't be 100% perfect (as rare TCRs will have coincidentally deleted further back and replaced multiple redundant codons), but it would improve accuracy a lot.

Giving the sparsity of applications that need this I won't be implemented this in a hurry, but I'm adding this here in case anyone needed it and fancies implementing.

JamieHeather avatar Oct 14 '22 03:10 JamieHeather