conga icon indicating copy to clipboard operation
conga copied to clipboard

Only CDR3 Region?

Open TheRaspberryFox opened this issue 11 months ago • 1 comments

Hello,

I have very much enjoyed using your program. Currently, the TCR clusters are being formed exclusively around differences in TRAV gene usage. With no visible differences in CDR3 region (looks very random).

Is there a way where I can make the TCR clusters using data only from the CDR3 region? Also, can I choose to only focus on motif's for the beta chain and ignore the alpha chain? I am thinking that this could help remove some noise and focus on more subtle differences.

Thanks

TheRaspberryFox avatar Mar 12 '24 21:03 TheRaspberryFox

Hi there,

Thanks for the question. If you look at

https://github.com/phbradley/conga/blob/master/conga/tcrdist/tcr_distances.py#L236C1-L238C1

which defines the paired TCRdist distance, it says something like

        return ( self.rep_dists[tcr1[0][0]][tcr2[0][0]] + weighted_cdr3_distance(tcr1[0][2], tcr2[0][2]) +
                 self.rep_dists[tcr1[1][0]][tcr2[1][0]] + weighted_cdr3_distance(tcr1[1][2], tcr2[1][2]) )

You could try replacing that with

        va_weight = 0
        cdr3a_weight = 0
        vb_weight = 0
        cdr3b_weight = 4 # or whatever

        return ( va_weight * self.rep_dists[tcr1[0][0]][tcr2[0][0]] + 
                 cdr3a_weight * weighted_cdr3_distance(tcr1[0][2], tcr2[0][2]) +
                 vb_weight * self.rep_dists[tcr1[1][0]][tcr2[1][0]] + 
                 cdr3b_weight * weighted_cdr3_distance(tcr1[1][2], tcr2[1][2]) )

You will also need to disable the C++ tcrdist alternative, for example by moving the tcrdist_cpp/bin folder or by hardcoding this function to always return False:

https://github.com/phbradley/conga/blob/master/conga/util.py#L29

Let me know what you find! Take care, Phil

phbradley avatar Mar 13 '24 15:03 phbradley