tcrdist3 icon indicating copy to clipboard operation
tcrdist3 copied to clipboard

flexible CDR based distance metrics

tcrdist3

Python application Coverage StatusDocumentation Status Docker Repository on Quay

Flexible distance measures for comparing T cell receptors

tcrdist3 is a python API-enabled toolkit for analyzing T-cell receptor repertoires. Some of the functionality and code is adapted from the original tcr-dist package which was released with the publication of Dash et al. Nature (2017) doi:10.1038/nature22383. This package contains a new API for computing tcrdistance measures as well as new features for biomarker development (bioRxiv (2020)). The package has been expanded to include gamma-delta TCRs; it has also been recoded to increase CPU efficiency using numba, a high-performance just-in-time compiler.

Installation

PyPI version

pip install tcrdist3

or

pip install git+https://github.com/kmayerb/[email protected]

Docker

Docker Repository on Quay

docker pull quay.io/kmayerb/tcrdist3:0.2.2

User-Contributed Colab Notebook Examples Using tcrdist3

1. Example K Nearest Neighbor Classification using tcrdist3

open in colab (Author: Liel Cohen-Lavi). This notebook illustrates how to integrate tcrdist3 with scikit-learn's implementation of K Nearest Neighbor classification. TCRdist-based KNN classification performance on a set of labeled receptors is assessed with cross-validation or training/test splits This simple method is proposed as a quickly implementable benchmark for the performance of more computationally intensive TCR-epitope specificity prediction approaches.

Package Documentation

Documentation Status

More documentation can be found at tcrdist3.readthedocs.

Basic Usage

import pandas as pd
from tcrdist.repertoire import TCRrep

df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df, 
            organism = 'mouse', 
            chains = ['alpha','beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

tr.pw_alpha
tr.pw_beta
tr.pw_cdr3_a_aa
tr.pw_cdr3_b_aa

from tcrdist.public import _neighbors_fixed_radius
_neighbors_fixed_radius(tr.pw_beta, 50)         

Sparse Matrix Representation

import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.breadth import get_safe_chunk

df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df[['subject','epitope','count','v_b_gene','j_b_gene','cdr3_b_aa','cdr3_b_nucseq']], 
            organism = 'mouse', 
            chains = ['beta'], 
            compute_distances = False)

# Set to desired number of CPUs
tr.cpus = 2

# Identify a safe chunk size based on input data shape and target number of 
# pairwise distance to be temporarily held in memory per node. 
safe_chunk_size = get_safe_chunk(
            tr.clone_df.shape[0], 
            tr.clone_df.shape[0], 
            target = 10**7) 

tr.compute_sparse_rect_distances(
        df = tr.clone_df, 
        radius=50,
        chunk_size = safe_chunk_size)

print(tr.rw_beta)

Citing

TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs

Mayer-Blackwell K, Schattgen S, Cohen-Lavi L, Crawford JC, Souquette A, Gaevert JA, Hertz T, Thomas PG, Bradley PH, Fiore-Gartland A. eLife (2021).

Quantifiable predictive features define epitope-specific T cell receptor repertoires

Pradyot Dash, Andrew J. Fiore-Gartland, Tomer Hertz, George C. Wang, Shalini Sharma, Aisha Souquette, Jeremy Chase Crawford, E. Bridie Clemens, Thi H. O. Nguyen, Katherine Kedzierska, Nicole L. La Gruta, Philip Bradley & Paul G. Thomas Nature (2017).