BayesPrism icon indicating copy to clipboard operation
BayesPrism copied to clipboard

gene names of reference and mixture do not match!

Open rmontagn opened this issue 1 year ago • 2 comments

Hello,

First I would like to thank you for developping BayesPrism v2.2. It allows me to use large atlases as reference for deconvolution.

However, when I tried to use it, I got an error for some deconvolution tasks:

Error in new.prism(reference = dfRef, mixture = dfBulk, input.type = "count.matrix",  : 
  Error: gene names of reference and mixture do not match!

The error arose when I tried to deconvolve 2 bulk RNA-Seq data and 2 pseudo-bulk RNA-Seq datasets generated by aggregating the reference single cell signal.

While the deconvolution algorithm ran for the bulk RNA-Seq arrays, I can't deconvolve the pseudo-bulk data.

My bulk and reference arguments are dataframes with exactly the same genes as columns and in the same order. So I do not understand what could go wrong.

Do you have any idea ?

Thank a lot

rmontagn avatar May 31 '24 12:05 rmontagn

Dear user,

Thank you for your interest in our method.

Could you kindly dump your input into an rdata file and share it with me ([email protected]) for troubleshooting?

Best,

Tinyi

tinyi avatar May 31 '24 23:05 tinyi

You forgot to transpose the bk.dat I guess. bk.dat should be sample by gene not gene by sample

Pentayouth avatar Oct 22 '24 08:10 Pentayouth

对齐 scRNA 和 bulk

sc_mat_sub <- sc_mat[common_genes, ] # genes × cells bulk_mat_sub <- bulk_mat_annot[common_genes, ] # genes × samples

2️⃣ 转置 scRNA:cells × genes

sc_mat_sub_t <- t(sc_mat_sub)

3️⃣ 检查列名和行名

all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub)) # TRUE 才行 [1] TRUE cell_state_labels <- celltype_labels # 每个细胞一个状态 bp <- new.prism(

  • reference = sc_mat_sub_t, # cells × genes
  • mixture = bulk_mat_sub, # genes × samples
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t, mixture = bulk_mat_sub, cell.type.labels = celltype_labels, : Error: gene names of reference and mixture do not match!

1️⃣ 对齐 scRNA 和 bulk

scRNA 列名

colnames(sc_mat_sub_t) <- toupper(gsub("\s+", "", colnames(sc_mat_sub_t)))

bulk 行名

rownames(bulk_mat_sub) <- toupper(gsub("\s+", "", rownames(bulk_mat_sub)))

再检查

all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub)) [1] TRUE colnames(sc_mat_sub_t) <- make.unique(colnames(sc_mat_sub_t)) rownames(bulk_mat_sub) <- make.unique(rownames(bulk_mat_sub)) common_genes <- intersect(colnames(sc_mat_sub_t), rownames(bulk_mat_sub)) length(common_genes) [1] 9595

取交集

sc_mat_sub_t <- sc_mat_sub_t[, common_genes] bulk_mat_sub <- bulk_mat_sub[common_genes, ] stopifnot(all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub))) bp <- new.prism(

  • reference = sc_mat_sub_t,
  • mixture = bulk_mat_sub,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t, mixture = bulk_mat_sub, cell.type.labels = celltype_labels, : Error: gene names of reference and mixture do not match!

创建 BayesPrism 对象

创建 BayesPrism 对象(新版接口)

is.matrix(sc_mat_sub_t) # TRUE [1] TRUE is.numeric(sc_mat_sub_t) # FALSE 可能是 TRUE/FALSE,需要转换 [1] TRUE is.matrix(bulk_mat_sub) # TRUE [1] TRUE is.numeric(bulk_mat_sub) # FALSE 可能是 TRUE/FALSE,需要转换 [1] TRUE sc_mat_sub_t <- apply(sc_mat_sub_t, 2, as.numeric) bulk_mat_sub <- apply(bulk_mat_sub, 2, as.numeric) colnames(sc_mat_sub_t) <- common_genes rownames(bulk_mat_sub) <- common_genes stopifnot(all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub))) head(colnames(sc_mat_sub_t)) [1] "NOC2L" "KLHL17" "PLEKHN1" "HES4" "ISG15" "AGRN"
head(rownames(bulk_mat_sub)) [1] "NOC2L" "KLHL17" "PLEKHN1" "HES4" "ISG15" "AGRN"
all(colnames(sc_mat_sub_t) == rownames(bulk_mat_sub)) [1] TRUE bp <- new.prism(

  • reference = sc_mat_sub_t,
  • mixture = bulk_mat_sub,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t, mixture = bulk_mat_sub, cell.type.labels = celltype_labels, : Error: gene names of reference and mixture do not match!

sc_mat_sub_t2 <- as.matrix(as.data.frame(sc_mat_sub_t)) bulk_mat_sub2 <- as.matrix(as.data.frame(bulk_mat_sub)) colnames(sc_mat_sub_t2) <- common_genes rownames(bulk_mat_sub2) <- common_genes stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

转换为 character

colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))

取交集,确保顺序一致

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

再次检查

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) clean_genes <- function(x) {

  • x <- gsub("[[:space:]]", "", x) # 删除所有空格
  • x <- gsub("[^[:alnum:]_.]", "", x) # 删除非字母数字和下划线
  • return(x)
  • }

colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2)) common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE] stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) sc_mat_sub_t2 <- apply(sc_mat_sub_t2, 2, as.numeric) bulk_mat_sub2 <- apply(bulk_mat_sub2, 2, as.numeric) colnames(sc_mat_sub_t2) <- common_genes rownames(bulk_mat_sub2) <- common_genes cell_state_labels <- celltype_labels bp <- new.prism(

  • reference = sc_mat_sub_t2, # cells × genes
  • mixture = bulk_mat_sub2, # genes × samples
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

scRNA: cells × genes

sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double" colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2))

bulk: genes × samples

bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double" rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))

再取交集,确保顺序一致

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

最终检查

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) stopifnot(length(celltype_labels) == nrow(sc_mat_sub_t2)) cell_state_labels <- celltype_labels # 每个细胞一个状态 bp <- new.prism(

  • reference = sc_mat_sub_t2, # cells × genes
  • mixture = bulk_mat_sub2, # genes × samples
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

1️⃣ 检查重复

sum(duplicated(colnames(sc_mat_sub_t2))) # scRNA 列名 [1] 0 sum(duplicated(rownames(bulk_mat_sub2))) # bulk 行名 [1] 0

2️⃣ 转成纯字符向量,确保没有因子

colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))

3️⃣ 确认长度

length(colnames(sc_mat_sub_t2)) # 应该 = ncol(sc_mat_sub_t2) [1] 9595 length(rownames(bulk_mat_sub2)) # 应该 = nrow(bulk_mat_sub2) [1] 9595

4️⃣ 严格排序

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

再次严格对齐

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) sc_mat_sub_t2 <- matrix(as.numeric(sc_mat_sub_t2), nrow = nrow(sc_mat_sub_t2), ncol = ncol(sc_mat_sub_t2)) rownames(sc_mat_sub_t2) <- rownames(sc_mat_sub_t2) colnames(sc_mat_sub_t2) <- common_genes

bulk_mat_sub2 <- matrix(as.numeric(bulk_mat_sub2), nrow = nrow(bulk_mat_sub2), ncol = ncol(bulk_mat_sub2)) rownames(bulk_mat_sub2) <- common_genes colnames(bulk_mat_sub2) <- colnames(bulk_mat_sub2) bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Error in validate.input(mixture) : Error: please specify the colnames of mixture / reference using gene identifiers!

colnames(bulk_mat_sub2) <- colnames(bulk_mat_sub2)

scRNA: cells × genes

colnames(sc_mat_sub_t2) <- common_genes rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))

bulk: genes × samples

rownames(bulk_mat_sub2) <- common_genes if (is.null(colnames(bulk_mat_sub2))) {

  • colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
  • }

再次检查

stopifnot(!any(is.na(colnames(sc_mat_sub_t2)))) stopifnot(!any(is.na(rownames(bulk_mat_sub2)))) stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))

创建 BayesPrism 对象

bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

clean_genes <- function(x) {

  • x <- toupper(x) # 全部大写
  • x <- gsub("\s+", "", x) # 删除空格
  • x <- gsub("[^A-Z0-9_]", "", x) # 只保留字母数字下划线
  • x <- make.unique(x) # 保证唯一
  • return(x)
  • }

colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2)) common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE] stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) sc_mat_sub_t2 <- matrix(as.numeric(sc_mat_sub_t2), nrow = nrow(sc_mat_sub_t2), ncol = ncol(sc_mat_sub_t2)) colnames(sc_mat_sub_t2) <- common_genes rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))

bulk_mat_sub2 <- matrix(as.numeric(bulk_mat_sub2), nrow = nrow(bulk_mat_sub2), ncol = ncol(bulk_mat_sub2)) rownames(bulk_mat_sub2) <- common_genes if (is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2))) stopifnot(length(celltype_labels) == nrow(sc_mat_sub_t2)) cell_state_labels <- celltype_labels bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

colnames(reference) == rownames(mixture) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'colnames': object 'reference' not found all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)) [1] TRUE library(BayesPrism)

每个细胞一个状态(简单处理)

cell_state_labels <- celltype_labels

创建 BayesPrism 对象

bp <- new.prism(

  • reference = sc_mat_sub_t2, # scRNA: cells × genes
  • mixture = bulk_mat_sub2, # bulk: genes × samples
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

library(BayesPrism)

1️⃣ 强制 scRNA 为 matrix,cells × genes,数值型

sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double"

2️⃣ 强制 bulk 为 matrix,genes × samples,数值型

bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double"

3️⃣ 清理基因名:只保留字母数字和下划线,全部大写,保证唯一

clean_genes <- function(x){

  • x <- toupper(x)
  • x <- gsub("\s+", "", x)
  • x <- gsub("[^A-Z0-9_]", "", x)
  • make.unique(x)
  • }

colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2))

4️⃣ 取交集并严格排序

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

5️⃣ 检查顺序

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))

6️⃣ 确保每个细胞一个状态

cell_state_labels <- celltype_labels

7️⃣ scRNA 行名必须唯一

rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))

8️⃣ bulk 列名也必须存在

if (is.null(colnames(bulk_mat_sub2

  • library(BayesPrism) Error: unexpected symbol in: "if (is.null(colnames(bulk_mat_sub2 library"

library(BayesPrism)

1️⃣ 强制 scRNA 为 matrix,cells × genes,数值型

sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double"

2️⃣ 强制 bulk 为 matrix,genes × samples,数值型

bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double"

3️⃣ 清理基因名:只保留字母数字和下划线,全部大写,保证唯一

clean_genes <- function(x){

  • x <- toupper(x)
  • x <- gsub("\s+", "", x)
  • x <- gsub("[^A-Z0-9_]", "", x)
  • make.unique(x)
  • }

colnames(sc_mat_sub_t2) <- clean_genes(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- clean_genes(rownames(bulk_mat_sub2))

4️⃣ 取交集并严格排序

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

5️⃣ 检查顺序

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))

6️⃣ 确保每个细胞一个状态

cell_state_labels <- celltype_labels

7️⃣ scRNA 行名必须唯一

rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))

8️⃣ bulk 列名也必须存在

if (is.null(colnames(bulk_mat_sub2))) {

  • colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))
  • }

9️⃣ 创建 BayesPrism 对象

bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

最终严格处理

colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2))

强制排序

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) common_genes <- sort(common_genes) # 排序保证顺序一致

sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

stopifnot(identical(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)))

再检查没有重复

stopifnot(!any(duplicated(colnames(sc_mat_sub_t2)))) stopifnot(!any(duplicated(rownames(bulk_mat_sub2))))

scRNA 行名和 bulk 列名

rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2))) if(is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))

再次确认 numeric

sc_mat_sub_t2 <- matrix(as.numeric(sc_mat_sub_t2), nrow = nrow(sc_mat_sub_t2)) bulk_mat_sub2 <- matrix(as.numeric(bulk_mat_sub2), nrow = nrow(bulk_mat_sub2)) colnames(sc_mat_sub_t2) <- common_genes rownames(bulk_mat_sub2) <- common_genes

创建对象

bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Error in validate.input(mixture) : Error: please specify the colnames of mixture / reference using gene identifiers!

library(BayesPrism)

1️⃣ scRNA: cells × genes

sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double" colnames(sc_mat_sub_t2) <- toupper(gsub("[^A-Z0-9_]", "", colnames(sc_mat_sub_t2))) # 清理 colnames(sc_mat_sub_t2) <- make.unique(colnames(sc_mat_sub_t2)) rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2)))

2️⃣ bulk: genes × samples

bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double" rownames(bulk_mat_sub2) <- toupper(gsub("[^A-Z0-9_]", "", rownames(bulk_mat_sub2))) # 清理 rownames(bulk_mat_sub2) <- make.unique(rownames(bulk_mat_sub2)) if(is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))

3️⃣ 取交集并排序

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) common_genes <- sort(common_genes)

sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

4️⃣ 再次检查

stopifnot(identical(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2))) stopifnot(!any(duplicated(colnames(sc_mat_sub_t2)))) stopifnot(!any(duplicated(rownames(bulk_mat_sub2))))

5️⃣ 确保每个细胞一个状态

cell_state_labels <- celltype_labels stopifnot(length(cell_state_labels) == nrow(sc_mat_sub_t2))

6️⃣ 创建 BayesPrism 对象

bp <- new.prism(

  • reference = sc_mat_sub_t2, # cells × genes
  • mixture = bulk_mat_sub2, # genes × samples
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

scRNA

stopifnot(is.matrix(sc_mat_sub_t2)) stopifnot(is.character(colnames(sc_mat_sub_t2))) stopifnot(is.character(rownames(sc_mat_sub_t2)))

bulk

stopifnot(is.matrix(bulk_mat_sub2)) stopifnot(is.character(rownames(bulk_mat_sub2))) stopifnot(is.character(colnames(bulk_mat_sub2)))

再次严格对齐

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2)))

scRNA

sc_mat_sub_t2 <- as.matrix(sc_mat_sub_t2) storage.mode(sc_mat_sub_t2) <- "double" colnames(sc_mat_sub_t2) <- as.character(colnames(sc_mat_sub_t2)) rownames(sc_mat_sub_t2) <- as.character(rownames(sc_mat_sub_t2))

bulk

bulk_mat_sub2 <- as.matrix(bulk_mat_sub2) storage.mode(bulk_mat_sub2) <- "double" rownames(bulk_mat_sub2) <- as.character(rownames(bulk_mat_sub2)) colnames(bulk_mat_sub2) <- as.character(colnames(bulk_mat_sub2))

对齐

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2)) common_genes <- sort(common_genes)

sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

再次检查

stopifnot(all(colnames(sc_mat_sub_t2) == rownames(bulk_mat_sub2))) bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

取交集,不排序

common_genes <- intersect(colnames(sc_mat_sub_t2), rownames(bulk_mat_sub2))

严格对齐

sc_mat_sub_t2 <- sc_mat_sub_t2[, common_genes, drop = FALSE] bulk_mat_sub2 <- bulk_mat_sub2[common_genes, , drop = FALSE]

scRNA

rownames(sc_mat_sub_t2) <- paste0("Cell", seq_len(nrow(sc_mat_sub_t2))) colnames(sc_mat_sub_t2) <- common_genes

bulk

rownames(bulk_mat_sub2) <- common_genes if(is.null(colnames(bulk_mat_sub2))) colnames(bulk_mat_sub2) <- paste0("Sample", seq_len(ncol(bulk_mat_sub2)))

创建对象

bp <- new.prism(

  • reference = sc_mat_sub_t2,
  • mixture = bulk_mat_sub2,
  • cell.type.labels = celltype_labels,
  • cell.state.labels = cell_state_labels,
  • key = "scRNA_bulk"
  • ) number of cells in each cell state cell.state.labels Ery_Erythrocyte 5 Ery_Erythrocyte 4 HSPC Ery_Erythrocyte 3 Ery_Erythrocyte 2 Ery_Erythrocyte 1 66 90 113 115 212 416 Dendritic cell Neutrophil FCGR3A+ Monocyte Plasma cell Platelet NK cell 1074 1424 2188 2990 3030 5502 CD8+ T cell B cell CD4+ T cell Monocyte 6655 9314 11483 12222 Number of outlier genes filtered from mixture = 0 Error in new.prism(reference = sc_mat_sub_t2, mixture = bulk_mat_sub2, : Error: gene names of reference and mixture do not match!

aizerfore-creator avatar Oct 10 '25 15:10 aizerfore-creator