djvdj
djvdj copied to clipboard
ROC analysis of CITE-seq / AVID-seq reagents
Thoughts on ROC analysis of protein-DNA tags as classifiers.
The question is how well a given reagent performs as a classifer relative to gene expression classifications (i.e., assuming these are the "gold standards"). AUC values could provide information about reagent quality and can be compared across reagents, batches, etc.
For a function roc_analysis(), Input data would be so or sce with:
- Cell type classifications based on gene expression (e.g. based on clustifyr)
- Raw or normalized counts of protein-DNA tags (CITE-seq antibodies, AVID-tags, antigen-DNA tags, etc)
For a comparison, assume two possible states (e.g., B vs T cell, or B cell vs all other cells). Then step through the range of recovered protein-DNA tag signal and calculate:
- True positive rate (
TP / (TP + FN)).TP= number of B cells scoring positive,FN= number of B cells scoring negative. - False positive rate (
FP / FP + TN).FP= number of T cells scoring positive,TN= number of T cells scoring negative.
plot_roc() would plot TPR vs FPR for each of the ranked detection values, and roc_auc() would provide the AUC value from the data.
cc @catherinenicholas
https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/
Maybe build off of pROC or ROCR. Base R plots make me sad
https://github.com/dariyasydykova/tidyroc/