conga
conga copied to clipboard
Just make Logo plots starting from TCR data
Hello, I have some groups of single cell TCRs for which I would like to make Logo plots including both alpha and beta chains and the usage of V and J genes as it is done by Conga.
By any chance is there some fast to use snipet code to make the Logo plots that can save me the time to cherry pick the parts I need from the conga code? :-D
Hi there, I added a little script to the scripts/
directory: make_tcr_logos.py
which reads a TSV-formatted file with TCR information and makes SVG and PNG-formatted TCR logos. You will need to pull the latest version to get it. You would run something like:
python ~/gitrepos/conga/scripts/make_tcr_logos.py --tcrs_tsvfile tcrs.tsv --outfile_prefix tcrs_test --organism human
where tcrs.tsv
started something like this:
va ja cdr3a cdr3a_nucseq vb jb cdr3b cdr3b_nucseq
TRAV1-2*01 TRAJ33*01 CAVSDSNYQLIW tgtgctgtgagtgatagcaactatcagttaatctgg TRBV6-4*01 TRBJ2-1*01 CASSDGQPNNEQFF tgtgccagcagtgatggacagcctaacaatgagcagttcttc
Or you can just look at the source code to see what conga pieces are being used. Let me know if you have any questions. Hope that helps!
Hi! Phil be me to the punch while I was putting it together, but I figured I'd share mine too. It's a nearly identical solution but intended for use in an interactive session (e.g. jupyter notebook) using a tab-delimited file containing your tcr sequences. The required columns are defined by tcr_keys
.
#%% from conga.tcrdist.make_tcr_logo import make_tcr_logo_for_tcrs from conga.tcrdist.tcr_distances import TcrDistCalculator import pandas as pd gene_file = '~/conga/conga/tcrdist/db/combo_xcr.tsv' gene_df = pd.read_csv(gene_file, sep = '\t') gene_df = gene_df[gene_df.organism == 'human']
tcr_keys = ('va','ja','cdr3a','cdr3a_nucseq', 'vb','jb','cdr3b','cdr3b_nucseq')
def retrieve_tcrs(df):
tcrs = []
arrays = [ df[x] for x in tcr_keys ]
for va,ja,cdr3a,cdr3a_nucseq,vb,jb,cdr3b,cdr3b_nucseq in zip(*arrays):
tcrs.append(((va, ja, cdr3a, cdr3a_nucseq.lower()),
(vb, jb, cdr3b, cdr3b_nucseq.lower())) )
return tcrs
tcrdist_calculator = TcrDistCalculator('human')
#%% read table of tcrs for the logo and clean up tcr_df = pd.read_csv('logo_test.tsv', sep = '\t').loc[:,tcr_keys].drop_duplicates()
for col in tcr_keys: assert col in tcr_df, f'Need column {col}'
#allele information is required. add if missing for gene in ['va','ja','vb','jb']: tcr_df[gene] = tcr_df[gene] + "*01"
tcr_df = tcr_df[(tcr_df.vb.isin(gene_df.id)) & (tcr_df.jb.isin(gene_df.id)) & (tcr_df.va.isin(gene_df.id)) & (tcr_df.ja.isin(gene_df.id))]
#%% pull tcrs from your df and make logos for alpha and beta chains tcrs = retrieve_tcrs(tcr_df)
for chain in "AB": pngfile = f"test_logo_{chain}_chain.png" make_tcr_logo_for_tcrs( tcrs, chain, 'human', pngfile, tcrdist_calculator=tcrdist_calculator )
Nice! Thanks Stefan!!!
Not even one but two solution, great! Thank you both, looking forward to try them out!
Hello everybody,
First of all thanks a lot Conga team for creating this nice and cool package, and for keeping in mind the gd TCR "aficionados" in your work. I am struggling to run the make_tcr_logos.py script with a gd tsv file. I have edited it to include the "human_gd" variable in organism. The error is the following one:
python scripts/make_tcr_logos.py --tcrs_tsvfile data/CD4_naive_1.tsv --outfile_prefix CD4_naive_1_2 --organism human_gd
Read 321 paired TCRs from data/CD4_naive_1.tsv
made: CD4_naive_1_2_tcr_logo_A.png
Traceback (most recent call last):
File "/home/willy_s/conga/scripts/make_tcr_logos.py", line 68, in
Do yo have an idea about what is going on? I think the problem is with the delta sequence logo.
Guillem
PS: Here a snapshot of my tcr file, I have not seen any strange sequence (i.e CDR3 with very few aminoacids)