scirpy
scirpy copied to clipboard
How to properly select metrics&cutoffparameter
Hi,
Thanks for developing scirpy, it's a powerful tool for analyzing TCR data.
I have a question regarding the selection of the metrics parameter when using scirpy.pp.ir_dist, scirpy.tl.ir_query, scirpy.tl.ir_query_annotate.
Assuming we are analyzing T cells and pooling TCR clones based on the amino acids (i.e., sequence = 'aa').
I noticed in your tutorial that tcrdist is used to compute CDR3 neighborhood graph, with a cutoff of 15 allowing 3 Rs mutating into N. However, while matching cells with VDJdb, identity is used.
Can I interpret this as a recommendation to use identity when matching data with public data, and tcrdist is preferable for computing similarity among a large pool of cells from the same specific condition?
Additionally, the default cutoff is 10 when using alignment, but in your tutorial, 15 is set when using tcrdist. Is this also a recommended value?
Thank you for your guidance.
Hi,
unfortunately, there is no straightforward answer. I am not aware of any benchmark that shows that at a given cutoff, receptors still have a X% chance of binding to the same epitope. So in the end it's just a gradient from "very stringent" (identity) to less stringent (alignment/tcrdist with increasing cutoffs).
This applies to both the definition of clonotype clusters and database search.