CytoTalk
CytoTalk copied to clipboard
Some implementation problems
Hello ! Thank you for this wonderful package and thank you for having revised it! :) I just picked up a few mistakes/blurry indications and I wanted to ask your opinion on them, to check if I'm doing something wrong.
-
Taking lst_scrna <- CytoTalk::from_single_cell_experiment(sce) cannot not work in principle since you need to add a step to create the dense matrix.
-
You say in your Read me file :
fpath_mat <- "~/Tan-Lab/scRNAseq-data-cpdb/sample_counts.txt" fpath_meta <- "~/Tan-Lab/scRNAseq-data-cpdb/sample_meta.txt" lst_scrna <- CytoTalk::read_matrix_with_meta(fpath_mat, fpath_meta)
But in fact your functionread_matrix_with_meta
is working withvroom_sparse_with_rownames
andvroom_with_rownames
so you should go with a tsv format (or specify it), since otherwise it can't detect the delimiter (sadly deprecated). -
Moreover, your param "@param auto_transform Should count data be transformed if detected?" is a bit scarry (in the same function). In fact, in
check_count_data
, the automatic param is set toauto_transform=TRUE
. -
You could also add just a line to start from a Seurat object for users by simply writing something like:
matrix_data <- Matrix(as.matrix(data.table(seurat_obj[["RNA"]]$counts)), sparse = TRUE)
. After all, on those inputs, I don't really understand why you worry so much on specifying different format, you just expect a matrix table with barcodes as rownames and a celltype table with barcodes as rownames, isn't it? -
In
normalize_sparse <- function(mat, scale.factor=10000) { log1p(Matrix::t(Matrix::t(mat) / Matrix::colSums(mat) * scale.factor)) }
, I know that 10 000 as scale factor is very common for single cell users. However, shouldn't it be a parameter for the user to fix? Indeed, scaled factor log normalization is known to not preserve mutual information (meanwhile in theory it should preserve ranks, I agree), so it may introduce some bias into cytotalk process. Have you checked information consistency? -
A really sad point of this cool package is that you can't compare multiple conditions. In fact, the network have to be created condition per condition, so after you're left with network analysis tools but on different skeleton processed with different paths, different costs, and so on. I agree that it's computationnally correct sample-wise, but since there is always controls in single-cell matrices, isn't it the whole point? I think I'm obviously missing something over here. How do you process for multiple conditions?
Sorry about this naive comment. Best