BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

convert seurat object to BPCell

Open aarebeca opened this issue 10 months ago • 5 comments

Hi,

I have a seurat object v5 and would like to convert into BPCells matrix. I see in the seurat vignette (https://satijalab.org/seurat/articles/seurat5_bpcells_interaction_vignette) that it is possible to convert the count matrix from seurat to BPCells. Would be possible to convert the data matrix after SCT analysis?

Thank you,

Rebeca

aarebeca avatar Feb 24 '25 20:02 aarebeca

Hi Rebeca,

Overall, yes this is possible but it is a bit tricky since Seurat has not fully updated their SCTransform functions to work with BPCells objects. I'll put some code snippets below for either converting an existing SCT assay to be disk-backed or doing the SCT analysis to be done with BPCells objects in the first place. For really large datasets you'll probably want to take the second approach, but if you've already got a working SCT object the first would be fine.

Note that I've used tempfile() in some of these examples -- you'd probably want to replace that with a non-temporary folder where you'd like to save data.

Hopefully at some point Seurat updates SCTransform to support BPCells matrices, but my second example should be a pretty good workaround in the mean time.

-Ben

Converting an existing assay

library(BPCells)
library(Seurat)
library(SeuratData)

pbmc <- SeuratData::LoadData("pbmc3k.SeuratData") |>
 UpdateSeuratObject() 
pbmc <- CreateSeuratObject(pbmc[["RNA"]]$counts)
pbmc <- pbmc |> SCTransform()

data_dir <- tempfile()
converted_assay <- CreateAssay5Object(
  counts = write_matrix_dir(pbmc[["SCT"]]$counts, file.path(data_dir, "counts")),
  data = write_matrix_dir(pbmc[["SCT"]]$data, file.path(data_dir, "data"))
)
converted_assay$scale.data <- write_matrix_dir(
  as(pbmc[["SCT"]]$scale.data, "dgCMatrix"), 
  file.path(data_dir, "scale.data")
)
pbmc[["SCT"]] <- converted_assay

Calculating SCTransform from scratch

library(BPCells)
library(Seurat)
library(SeuratData)

# BPCells-enabled SCT
#
# Uses BPCells::sctransform_pearson to create a disk-backed SCT object
# Notes: only follows the default settings for SCTransform, does not implement
#   most options
# @param counts_mat A BPCells counts matrix
# @param vst Optionally, a pre-fit vst object produced by sctransform::vst()
# @return Seurat assay object, where the scale.data slot should be equivalent to the
#   standard SCT scale.data slot
make_bpcells_SCT <- function(counts_mat, vst=NULL) {
  if (is.null(vst)) {
    vst <- SCTransform(counts_mat, data.frame(row.names=colnames(counts_mat)))
  }
  var_feat <- rownames(counts_mat)[rownames(counts_mat) %in% vst$variable_features]
  params <- vst$model_pars_fit[var_feat,]
  
  m_bpcells <- sctransform_pearson(
    mat = counts_mat[var_feat,],
    gene_theta = params[,"theta"],
    gene_beta = exp(params[,"(Intercept)"]),
    cell_read_counts = colSums(counts_mat),
    min_var = 1/25,
    clip_range = c(-1,1) * sqrt(ncol(counts_mat)/30)
  )
  m_bpcells <- m_bpcells - rowMeans(m_bpcells)

  assay <- CreateAssay5Object(
    counts=counts_mat,
    data=NormalizeData(counts_mat)
  )
  assay$scale.data <- m_bpcells
  assay
}


pbmc <- SeuratData::LoadData("pbmc3k.SeuratData") |>
 UpdateSeuratObject() 
pbmc <- CreateSeuratObject(pbmc[["RNA"]]$counts)
pbmc2 <- pbmc
pbmc3 <- pbmc

# We make this to check our results, and to show how to use existing
# SCT normalization parameters
pbmc <- pbmc |> SCTransform()
vst <- list(
  variable_features = rownames(pbmc[["SCT"]]$scale.data),
  model_pars_fit = pbmc[["SCT"]]@SCTModel.list[[1]]@feature.attributes
)

# Convert the counts matrix to a BPCells matrix
bp_counts_mat <- pbmc[["RNA"]]$counts |> write_matrix_dir(tempfile())

# Re-calculating the SCT model from scratch gives < 1e-4 relative difference
pbmc2[["SCT"]] <- make_bpcells_SCT(bp_counts_mat)
all.equal(as.matrix(pbmc2[["SCT"]]$scale.data), pbmc[["SCT"]]$scale.data)

# Using the pre-calculated SCT model gives < 1e-7 relative difference
pbmc3[["SCT"]] <- make_bpcells_SCT(bp_counts_mat, vst=vst)
all.equal(as.matrix(pbmc3[["SCT"]]$scale.data), pbmc[["SCT"]]$scale.data)

bnprks avatar Feb 26 '25 04:02 bnprks

Is it possible to regress out variables using the bpcells implementation of sctransform?

yxsee avatar Mar 03 '25 07:03 yxsee

Hi @yxsee, currently BPCells can support regressing out variables via a linear regression model with BPCells::regress_out(), but it does not currently support regressing out variables via a negative binomial GLM as used by default in Seurat's SCTransform. It would be possible to run BPCells::sctransform_pearson() followed by BPCells::regress_out(), though this will not be fully equivalent to what sctransform does. My comment here outlines how to use regress_out to interoperate with Seurat's non-SCTransform normalization.

Have you previously found that regressing out variables with sctransform gives much better results? I didn't implement support for regressing out variables initially since I wasn't sure how useful it is in practice.

-Ben

(Also, a slight clarification for other readers -- BPCells can't fit an SCTransform model, but given a fit model with no extra variables regressed out it can perform the transformation. Since Seurat defaults to fitting a model on a small subset of the data, this works pretty well)

bnprks avatar Mar 03 '25 08:03 bnprks

Hi Ben, thanks for the quick reply! I usually regress out percent.mito and cell cycle effects when running SCT and remove the 'data' slot after dimension reductions to save memory. However, my recent datasets are taking up more memory so I was hoping there is some way to perform SCT with regression on disk.

yxsee avatar Mar 03 '25 08:03 yxsee

That sounds like a reasonable use-case. I'll see if we can put on our development to-do list to update our SCT implementation to support regressing out variables like this. I can't promise it will definitely get done, or on exactly what timeline but I will ping you back here if we add support. If you have a bit of C++ experience and would like to implement + contribute this to BPCells we're always open to that and I could provide pointers on how to get started via email or something (bparks [at] stanford.edu).

In the mean time, either the example here or the comment I linked about regress_out are probably the best available options for doing a similar analysis with BPCells + Seurat.

bnprks avatar Mar 04 '25 01:03 bnprks