djvdj icon indicating copy to clipboard operation
djvdj copied to clipboard

ROC analysis of CITE-seq / AVID-seq reagents

Open jayhesselberth opened this issue 3 years ago • 2 comments
trafficstars

Thoughts on ROC analysis of protein-DNA tags as classifiers.

The question is how well a given reagent performs as a classifer relative to gene expression classifications (i.e., assuming these are the "gold standards"). AUC values could provide information about reagent quality and can be compared across reagents, batches, etc.

For a function roc_analysis(), Input data would be so or sce with:

  1. Cell type classifications based on gene expression (e.g. based on clustifyr)
  2. Raw or normalized counts of protein-DNA tags (CITE-seq antibodies, AVID-tags, antigen-DNA tags, etc)

For a comparison, assume two possible states (e.g., B vs T cell, or B cell vs all other cells). Then step through the range of recovered protein-DNA tag signal and calculate:

  1. True positive rate (TP / (TP + FN)). TP = number of B cells scoring positive, FN = number of B cells scoring negative.
  2. False positive rate (FP / FP + TN). FP = number of T cells scoring positive, TN = number of T cells scoring negative.

plot_roc() would plot TPR vs FPR for each of the ranked detection values, and roc_auc() would provide the AUC value from the data.

cc @catherinenicholas

jayhesselberth avatar Dec 10 '21 11:12 jayhesselberth

https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/

Maybe build off of pROC or ROCR. Base R plots make me sad

jayhesselberth avatar Dec 10 '21 11:12 jayhesselberth

https://github.com/dariyasydykova/tidyroc/

jayhesselberth avatar Dec 10 '21 13:12 jayhesselberth