conga
conga copied to clipboard
Only CDR3 Region?
Hello,
I have very much enjoyed using your program. Currently, the TCR clusters are being formed exclusively around differences in TRAV gene usage. With no visible differences in CDR3 region (looks very random).
Is there a way where I can make the TCR clusters using data only from the CDR3 region? Also, can I choose to only focus on motif's for the beta chain and ignore the alpha chain? I am thinking that this could help remove some noise and focus on more subtle differences.
Thanks
Hi there,
Thanks for the question. If you look at
https://github.com/phbradley/conga/blob/master/conga/tcrdist/tcr_distances.py#L236C1-L238C1
which defines the paired TCRdist distance, it says something like
return ( self.rep_dists[tcr1[0][0]][tcr2[0][0]] + weighted_cdr3_distance(tcr1[0][2], tcr2[0][2]) +
self.rep_dists[tcr1[1][0]][tcr2[1][0]] + weighted_cdr3_distance(tcr1[1][2], tcr2[1][2]) )
You could try replacing that with
va_weight = 0
cdr3a_weight = 0
vb_weight = 0
cdr3b_weight = 4 # or whatever
return ( va_weight * self.rep_dists[tcr1[0][0]][tcr2[0][0]] +
cdr3a_weight * weighted_cdr3_distance(tcr1[0][2], tcr2[0][2]) +
vb_weight * self.rep_dists[tcr1[1][0]][tcr2[1][0]] +
cdr3b_weight * weighted_cdr3_distance(tcr1[1][2], tcr2[1][2]) )
You will also need to disable the C++ tcrdist alternative, for example by moving the tcrdist_cpp/bin folder or by hardcoding this function to always return False:
https://github.com/phbradley/conga/blob/master/conga/util.py#L29
Let me know what you find! Take care, Phil