gene names of reference and mixture do not match!
Hello,
First I would like to thank you for developping BayesPrism v2.2. It allows me to use large atlases as reference for deconvolution.
However, when I tried to use it, I got an error for some deconvolution tasks:
Error in new.prism(reference = dfRef, mixture = dfBulk, input.type = "count.matrix", :
Error: gene names of reference and mixture do not match!
The error arose when I tried to deconvolve 2 bulk RNA-Seq data and 2 pseudo-bulk RNA-Seq datasets generated by aggregating the reference single cell signal.
While the deconvolution algorithm ran for the bulk RNA-Seq arrays, I can't deconvolve the pseudo-bulk data.
My bulk and reference arguments are dataframes with exactly the same genes as columns and in the same order. So I do not understand what could go wrong.
Do you have any idea ?
Thank a lot
Dear user,
Thank you for your interest in our method.
Could you kindly dump your input into an rdata file and share it with me ([email protected]) for troubleshooting?
Best,
Tinyi
You forgot to transpose the bk.dat I guess. bk.dat should be sample by gene not gene by sample
对齐 scRNA 和 bulk
sc_mat_sub <- sc_mat[common_genes, ] # genes × cells bulk_mat_sub <- bulk_mat_annot[common_genes, ] # genes × samples
2️⃣ 转置 scRNA:cells × genes
sc_mat_sub_t <- t(sc_mat_sub)
3️⃣ 检查列名和行名
all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub)) # TRUE 才行 [1] TRUE cell_state_labels <- celltype_labels # 每个细胞一个状态 bp <- new.prism(
- reference = sc_mat_sub_t, # cells × genes
- mixture = bulk_mat_sub, # genes × samples
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t, mixture = bulk_mat_sub, cell.type.labels = celltype_labels, : Error: gene names of reference and mixture do not match!
1️⃣ 对齐 scRNA 和 bulk
scRNA 列名
colnames(sc_mat_sub_t) <- toupper(gsub("\s+", "", colnames(sc_mat_sub_t)))
bulk 行名
rownames(bulk_mat_sub) <- toupper(gsub("\s+", "", rownames(bulk_mat_sub)))
再检查
all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub)) [1] TRUE colnames(sc_mat_sub_t) <- make.unique(colnames(sc_mat_sub_t)) rownames(bulk_mat_sub) <- make.unique(rownames(bulk_mat_sub)) common_genes <- intersect(colnames(sc_mat_sub_t), rownames(bulk_mat_sub)) length(common_genes) [1] 9595
取交集
sc_mat_sub_t <- sc_mat_sub_t[, common_genes] bulk_mat_sub <- bulk_mat_sub[common_genes, ] stopifnot(all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub))) bp <- new.prism(
- reference = sc_mat_sub_t,
- mixture = bulk_mat_sub,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t, mixture = bulk_mat_sub, cell.type.labels = celltype_labels, : Error: gene names of reference and mixture do not match!
创建 BayesPrism 对象
创建 BayesPrism 对象(新版接口)
is.matrix(sc_mat_sub_t) # TRUE [1] TRUE is.numeric(sc_mat_sub_t) # FALSE 可能是 TRUE/FALSE,需要转换 [1] TRUE is.matrix(bulk_mat_sub) # TRUE [1] TRUE is.numeric(bulk_mat_sub) # FALSE 可能是 TRUE/FALSE,需要转换 [1] TRUE sc_mat_sub_t <- apply(sc_mat_sub_t, 2, as.numeric) bulk_mat_sub <- apply(bulk_mat_sub, 2, as.numeric) colnames(sc_mat_sub_t) <- common_genes rownames(bulk_mat_sub) <- common_genes stopifnot(all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub))) head(colnames(sc_mat_sub_t)) [1] "NOC2L" "KLHL17" "PLEKHN1" "HES4" "ISG15" "AGRN"
head(rownames(bulk_mat_sub)) [1] "NOC2L" "KLHL17" "PLEKHN1" "HES4" "ISG15" "AGRN"
all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub)) [1] TRUE bp <- new.prism(
- reference = sc_mat_sub_t,
- mixture = bulk_mat_sub,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t, mixture = bulk_mat_sub, cell.type.labels = celltype_labels, : Error: gene names of reference and mixture do not match!
sc_mat_sub_t2 <- as.matrix(as.data.frame(sc_mat_sub_t)) bulk_mat_sub2 <- as.matrix(as.data.frame(bulk_mat_sub)) colnames(sc_mat_sub_t2) <- common_genes rownames(bulk_mat_sub2) <- common_genes stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
转换为 character
colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))
取交集,确保顺序一致
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
再次检查
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) clean_genes <- function(x) {
- x <- gsub("[[:space:]]", "", x) # 删除所有空格
- x <- gsub("[^[:alnum:]_.]", "", x) # 删除非字母数字和下划线
- return(x)
- }
colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2)) common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE] stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) sc_mat_sub_t2 <- apply(sc_mat_sub_t2, 2, as.numeric) bulk_mat_sub2 <- apply(bulk_mat_sub2, 2, as.numeric) colnames(sc_mat_sub_t2) <- common_genes rownames(bulk_mat_sub2) <- common_genes cell_state_labels <- celltype_labels bp <- new.prism(
- reference = sc_mat_sub_t2, # cells × genes
- mixture = bulk_mat_sub2, # genes × samples
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
scRNA: cells × genes
sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double" colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2))
bulk: genes × samples
bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double" rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))
再取交集,确保顺序一致
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
最终检查
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) stopifnot(length(celltype_labels) == nrow(sc_mat_sub_t2)) cell_state_labels <- celltype_labels # 每个细胞一个状态 bp <- new.prism(
- reference = sc_mat_sub_t2, # cells × genes
- mixture = bulk_mat_sub2, # genes × samples
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
1️⃣ 检查重复
sum(duplicated(colnames(sc_mat_sub_t2))) # scRNA 列名 [1] 0 sum(duplicated(rownames(bulk_mat_sub2))) # bulk 行名 [1] 0
2️⃣ 转成纯字符向量,确保没有因子
colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))
3️⃣ 确认长度
length(colnames(sc_mat_sub_t2)) # 应该 = ncol(sc_mat_sub_t2) [1] 9595 length(rownames(bulk_mat_sub2)) # 应该 = nrow(bulk_mat_sub2) [1] 9595
4️⃣ 严格排序
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
再次严格对齐
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) sc_mat_sub_t2 <- matrix(as.numeric(sc_mat_sub_t2), nrow = nrow(sc_mat_sub_t2), ncol = ncol(sc_mat_sub_t2)) rownames(sc_mat_sub_t2) <- rownames(sc_mat_sub_t2) colnames(sc_mat_sub_t2) <- common_genes
bulk_mat_sub2 <- matrix(as.numeric(bulk_mat_sub2), nrow = nrow(bulk_mat_sub2), ncol = ncol(bulk_mat_sub2)) rownames(bulk_mat_sub2) <- common_genes colnames(bulk_mat_sub2) <- colnames(bulk_mat_sub2) bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Error in validate.input(mixture) : Error: please specify the colnames of mixture / reference using gene identifiers!
colnames(bulk_mat_sub2) <- colnames(bulk_mat_sub2)
scRNA: cells × genes
colnames(sc_mat_sub_t2) <- common_genes rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))
bulk: genes × samples
rownames(bulk_mat_sub2) <- common_genes if (is.null(colnames(bulk_mat_sub2))) {
- colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
- }
再次检查
stopifnot(!any(is.na(colnames(sc_mat_sub_t2)))) stopifnot(!any(is.na(rownames(bulk_mat_sub2)))) stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))
创建 BayesPrism 对象
bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
clean_genes <- function(x) {
- x <- toupper(x) # 全部大写
- x <- gsub("\s+", "", x) # 删除空格
- x <- gsub("[^A-Z0-9_]", "", x) # 只保留字母数字下划线
- x <- make.unique(x) # 保证唯一
- return(x)
- }
colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2)) common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE] stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) sc_mat_sub_t2 <- matrix(as.numeric(sc_mat_sub_t2), nrow = nrow(sc_mat_sub_t2), ncol = ncol(sc_mat_sub_t2)) colnames(sc_mat_sub_t2) <- common_genes rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))
bulk_mat_sub2 <- matrix(as.numeric(bulk_mat_sub2), nrow = nrow(bulk_mat_sub2), ncol = ncol(bulk_mat_sub2)) rownames(bulk_mat_sub2) <- common_genes if (is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2))) stopifnot(length(celltype_labels) == nrow(sc_mat_sub_t2)) cell_state_labels <- celltype_labels bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
colnames(reference) == rownames(mixture) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'colnames': object 'reference' not found all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)) [1] TRUE library(BayesPrism)
每个细胞一个状态(简单处理)
cell_state_labels <- celltype_labels
创建 BayesPrism 对象
bp <- new.prism(
- reference = sc_mat_sub_t2, # scRNA: cells × genes
- mixture = bulk_mat_sub2, # bulk: genes × samples
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
library(BayesPrism)
1️⃣ 强制 scRNA 为 matrix,cells × genes,数值型
sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double"
2️⃣ 强制 bulk 为 matrix,genes × samples,数值型
bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double"
3️⃣ 清理基因名:只保留字母数字和下划线,全部大写,保证唯一
clean_genes <- function(x){
- x <- toupper(x)
- x <- gsub("\s+", "", x)
- x <- gsub("[^A-Z0-9_]", "", x)
- make.unique(x)
- }
colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2))
4️⃣ 取交集并严格排序
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
5️⃣ 检查顺序
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))
6️⃣ 确保每个细胞一个状态
cell_state_labels <- celltype_labels
7️⃣ scRNA 行名必须唯一
rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))
8️⃣ bulk 列名也必须存在
if (is.null(colnames(bulk_mat_sub2
- library(BayesPrism) Error: unexpected symbol in: "if (is.null(colnames(bulk_mat_sub2 library"
library(BayesPrism)
1️⃣ 强制 scRNA 为 matrix,cells × genes,数值型
sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double"
2️⃣ 强制 bulk 为 matrix,genes × samples,数值型
bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double"
3️⃣ 清理基因名:只保留字母数字和下划线,全部大写,保证唯一
clean_genes <- function(x){
- x <- toupper(x)
- x <- gsub("\s+", "", x)
- x <- gsub("[^A-Z0-9_]", "", x)
- make.unique(x)
- }
colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2))
4️⃣ 取交集并严格排序
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
5️⃣ 检查顺序
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))
6️⃣ 确保每个细胞一个状态
cell_state_labels <- celltype_labels
7️⃣ scRNA 行名必须唯一
rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))
8️⃣ bulk 列名也必须存在
if (is.null(colnames(bulk_mat_sub2))) {
- colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
- }
9️⃣ 创建 BayesPrism 对象
bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
最终严格处理
colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))
强制排序
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) common_genes <- sort(common_genes) # 排序保证顺序一致
sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
stopifnot(identical(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)))
再检查没有重复
stopifnot(!any(duplicated(colnames(sc_mat_sub_t2)))) stopifnot(!any(duplicated(rownames(bulk_mat_sub2))))
scRNA 行名和 bulk 列名
rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2))) if(is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
再次确认 numeric
sc_mat_sub_t2 <- matrix(as.numeric(sc_mat_sub_t2), nrow = nrow(sc_mat_sub_t2)) bulk_mat_sub2 <- matrix(as.numeric(bulk_mat_sub2), nrow = nrow(bulk_mat_sub2)) colnames(sc_mat_sub_t2) <- common_genes rownames(bulk_mat_sub2) <- common_genes
创建对象
bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Error in validate.input(mixture) : Error: please specify the colnames of mixture / reference using gene identifiers!
library(BayesPrism)
1️⃣ scRNA: cells × genes
sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double" colnames(sc_mat_sub_t2) <- toupper(gsub("[^A-Z0-9_]", "", colnames(sc_mat_sub_t2))) # 清理 colnames(sc_mat_sub_t2) <- make.unique(colnames(sc_mat_sub_t2)) rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))
2️⃣ bulk: genes × samples
bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double" rownames(bulk_mat_sub2) <- toupper(gsub("[^A-Z0-9_]", "", rownames(bulk_mat_sub2))) # 清理 rownames(bulk_mat_sub2) <- make.unique(rownames(bulk_mat_sub2)) if(is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
3️⃣ 取交集并排序
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) common_genes <- sort(common_genes)
sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
4️⃣ 再次检查
stopifnot(identical(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2))) stopifnot(!any(duplicated(colnames(sc_mat_sub_t2)))) stopifnot(!any(duplicated(rownames(bulk_mat_sub2))))
5️⃣ 确保每个细胞一个状态
cell_state_labels <- celltype_labels stopifnot(length(cell_state_labels) == nrow(sc_mat_sub_t2))
6️⃣ 创建 BayesPrism 对象
bp <- new.prism(
- reference = sc_mat_sub_t2, # cells × genes
- mixture = bulk_mat_sub2, # genes × samples
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
scRNA
stopifnot(is.matrix(sc_mat_sub_t2)) stopifnot(is.character(colnames(sc_mat_sub_t2))) stopifnot(is.character(rownames(sc_mat_sub_t2)))
bulk
stopifnot(is.matrix(bulk_mat_sub2)) stopifnot(is.character(rownames(bulk_mat_sub2))) stopifnot(is.character(colnames(bulk_mat_sub2)))
再次严格对齐
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))
scRNA
sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double" colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(sc_mat_sub_t2) <- as.character(rownames(sc_mat_sub_t2))
bulk
bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double" rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2)) colnames(bulk_mat_sub2) <- as.character(colnames(bulk_mat_sub2))
对齐
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) common_genes <- sort(common_genes)
sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
再次检查
stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!
取交集,不排序
common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2))
严格对齐
sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]
scRNA
rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2))) colnames(sc_mat_sub_t2) <- common_genes
bulk
rownames(bulk_mat_sub2) <- common_genes if(is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
创建对象
bp <- new.prism(
- reference = sc_mat_sub_t2,
- mixture = bulk_mat_sub2,
- cell.type.labels = celltype_labels,
- cell.state.labels = cell_state_labels,
- key = "scRNA_bulk"
- ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!