conga Just make Logo plots starting from TCR data

Hello, I have some groups of single cell TCRs for which I would like to make Logo plots including both alpha and beta chains and the usage of V and J genes as it is done by Conga.

By any chance is there some fast to use snipet code to make the Logo plots that can save me the time to cherry pick the parts I need from the conga code? :-D

Nov 07 '22 19:11 erosix

Hi there, I added a little script to the scripts/ directory: make_tcr_logos.py which reads a TSV-formatted file with TCR information and makes SVG and PNG-formatted TCR logos. You will need to pull the latest version to get it. You would run something like:

python ~/gitrepos/conga/scripts/make_tcr_logos.py --tcrs_tsvfile tcrs.tsv --outfile_prefix tcrs_test --organism human

where tcrs.tsv started something like this:

va	ja	cdr3a	cdr3a_nucseq	vb	jb	cdr3b	cdr3b_nucseq
TRAV1-2*01	TRAJ33*01	CAVSDSNYQLIW	tgtgctgtgagtgatagcaactatcagttaatctgg	TRBV6-4*01	TRBJ2-1*01	CASSDGQPNNEQFF	tgtgccagcagtgatggacagcctaacaatgagcagttcttc

Or you can just look at the source code to see what conga pieces are being used. Let me know if you have any questions. Hope that helps!

Nov 07 '22 21:11 phbradley

Hi! Phil be me to the punch while I was putting it together, but I figured I'd share mine too. It's a nearly identical solution but intended for use in an interactive session (e.g. jupyter notebook) using a tab-delimited file containing your tcr sequences. The required columns are defined by tcr_keys.

#%% from conga.tcrdist.make_tcr_logo import make_tcr_logo_for_tcrs from conga.tcrdist.tcr_distances import TcrDistCalculator import pandas as pd gene_file = '~/conga/conga/tcrdist/db/combo_xcr.tsv' gene_df = pd.read_csv(gene_file, sep = '\t') gene_df = gene_df[gene_df.organism == 'human']

tcr_keys = ('va','ja','cdr3a','cdr3a_nucseq', 'vb','jb','cdr3b','cdr3b_nucseq')

def retrieve_tcrs(df): tcrs = [] arrays = [ df[x] for x in tcr_keys ] for va,ja,cdr3a,cdr3a_nucseq,vb,jb,cdr3b,cdr3b_nucseq in zip(*arrays): tcrs.append(((va, ja, cdr3a, cdr3a_nucseq.lower()), (vb, jb, cdr3b, cdr3b_nucseq.lower())) )
return tcrs

tcrdist_calculator = TcrDistCalculator('human')

#%% read table of tcrs for the logo and clean up tcr_df = pd.read_csv('logo_test.tsv', sep = '\t').loc[:,tcr_keys].drop_duplicates()

for col in tcr_keys: assert col in tcr_df, f'Need column {col}'

#allele information is required. add if missing for gene in ['va','ja','vb','jb']: tcr_df[gene] = tcr_df[gene] + "*01"

tcr_df = tcr_df[(tcr_df.vb.isin(gene_df.id)) & (tcr_df.jb.isin(gene_df.id)) & (tcr_df.va.isin(gene_df.id)) & (tcr_df.ja.isin(gene_df.id))]

#%% pull tcrs from your df and make logos for alpha and beta chains tcrs = retrieve_tcrs(tcr_df)

for chain in "AB": pngfile = f"test_logo_{chain}_chain.png" make_tcr_logo_for_tcrs( tcrs, chain, 'human', pngfile, tcrdist_calculator=tcrdist_calculator )

Nov 07 '22 22:11 sschattgen

Nice! Thanks Stefan!!!

Nov 07 '22 22:11 phbradley

Not even one but two solution, great! Thank you both, looking forward to try them out!

Nov 09 '22 07:11 erosix

Hello everybody,

First of all thanks a lot Conga team for creating this nice and cool package, and for keeping in mind the gd TCR "aficionados" in your work. I am struggling to run the make_tcr_logos.py script with a gd tsv file. I have edited it to include the "human_gd" variable in organism. The error is the following one:

python scripts/make_tcr_logos.py --tcrs_tsvfile data/CD4_naive_1.tsv --outfile_prefix CD4_naive_1_2 --organism human_gd Read 321 paired TCRs from data/CD4_naive_1.tsv made: CD4_naive_1_2_tcr_logo_A.png Traceback (most recent call last): File "/home/willy_s/conga/scripts/make_tcr_logos.py", line 68, in make_tcr_logo_for_tcrs( File "/home/willy_s/conga/conga/tcrdist/make_tcr_logo.py", line 515, in make_tcr_logo_for_tcrs cmds = make_default_logo_svg_cmds( File "/home/willy_s/conga/conga/tcrdist/make_tcr_logo.py", line 376, in make_default_logo_svg_cmds b_junction_results = tcr_sampler.analyze_junction( organism, vb_gene, jb_gene, File "/home/willy_s/conga/conga/tcrdist/tcr_sampler.py", line 401, in analyze_junction assert 3*len(cdr3_protseq) == len(ncount) AssertionError

Do yo have an idea about what is going on? I think the problem is with the delta sequence logo.

Guillem

PS: Here a snapshot of my tcr file, I have not seen any strange sequence (i.e CDR3 with very few aminoacids)

imatge

May 11 '23 11:05 guillemsanchezsanchez1996

conga conga copied to clipboard

Just make Logo plots starting from TCR data

conga
conga copied to clipboard