SCEVAN icon indicating copy to clipboard operation
SCEVAN copied to clipboard

T-cell recognized as tumor cells

Open oandrefonseca opened this issue 1 year ago • 4 comments

I am testing SCEVAN for stratifying malignant cells in CD45 sorted dataset, i.e., up to a certain limit, I am confident about the cells' status (malignant versus normals). On that note, I noticed that CD45+ cells are classified as tumor cells.

After further investigation, I noticed that those are mostly immune cells (NK and T-cells). I suppose that it might be affected by VDJ gene recombinations. Have you faced this issue? Would it be fair to remove those genes from my gene counts? What would be the best protocol or work around it?

oandrefonseca avatar Jun 16 '23 16:06 oandrefonseca

similar question. Not only T cell.

cell_type filtered normal tumor
B cell 624 1268 3869
T cell 9657 11313 30878
dendritic cell 1541 2679 688
endothelial cell 598 2151 329
epithelial cell 6707 6857 50738
erythrocyte 46 1 nan
fibroblast 1244 3076 584
macrophage 2047 5672 1532
mast cell 154 462 651
neutrophil 1047 41 17
plasma cell 63 489 114

ATPs avatar Jun 20 '23 05:06 ATPs

https://www.cell.com/cancer-cell/fulltext/S1535-6108(21)00497-9 I have data from this paper. Cells from normal tissues were annotated as tumor cell.

ATPs avatar Jun 20 '23 06:06 ATPs

@oandrefonseca Thank you for the report. No we have not addressed this issue unfortunately, I wanted to ask you to better understand the issue if you are analysing 10x samples generally some misclassification may be due to the noisiness of these data or samples with very low tumor purity. Can you share the heatmap generated by SCEVAN? Thank you

@ATPs Thanks for the tip, since I see data from many aggregated samples in your table, I wanted to ask you if to analyse the dataset you mentioned did you analyse each sample individually with SCEVAN or did you analyse the entire matrix containing all samples? Thank you.

AntonioDeFalco avatar Jun 20 '23 22:06 AntonioDeFalco

@AntonioDeFalco Thank you for your reply. I think it is individual sample, since I used the function in this example.

http://htmlpreview.github.io/?https://github.com/AntonioDeFalco/SCEVAN/blob/main/vignettes/multiSamples.html

library(SCEVAN)
results <- SCEVAN::multiSampleComparisonClonalCN(listCountMtx, analysisName = "all", organism = "human" , par_cores = 20)

My code looks like:

library(SCEVAN)
library(Seurat)
alldata <- qs::qread("Combined_samples.qs")

alldata.list <- SplitObject(alldata, split.by = 'donor_id')
listCountMtx <- list()
for (donor_id in names(alldata.list)) listCountMtx[[donor_id]] <- alldata.list[[donor_id]]@assays$RNA@counts

rm(alldata, alldata.list)
gc()
results <- SCEVAN::multiSampleComparisonClonalCN(listCountMtx, analysisName = "all", organism = "human" , par_cores = 20)

# combine results
xdf.predict <- do.call(rbind, results[[1]])
write.csv(xdf.predict,'Combined_samples.20230619SCEVAN.tumor_cell_prediction.csv')

The cell types were annotated in the original file.

ATPs avatar Jun 21 '23 01:06 ATPs