seurat icon indicating copy to clipboard operation
seurat copied to clipboard

Severe decrease in processing speed after CRAN v5 update

Open Dario-Rocha opened this issue 1 year ago • 20 comments

Hello dear Seurat team, I've been using Seurat v5 for over half a year now, and after the official release, some scripts that would run in under half an hour now simply can't be run in a whole working day. One of the steps that has gotten significantly slower is "Calculating Leverage Scores" during SketchData (which now can take over 10 hours) and NormalizeData (which used to be almost instantaneous and now takes almost a minute). I've noticed that during this and other processes, no swap memory is being used, which kind of hints me that the parallelization isn't working anymore, or that for some reason R+Rstudio are refusing to use swap.

plan(multisession, workers = 14, gc = TRUE)
options(future.globals.maxSize = 3e+09)

R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.3.1

Dario-Rocha avatar Dec 04 '23 09:12 Dario-Rocha

I've been running into the same issue. I was wondering if it was just me, but "Calculating Leverage Scores" seems to take forever.

jcorn427 avatar Dec 04 '23 18:12 jcorn427

Thanks for pointing this out - are you observing this behavior also on any of our example datasets (either small or large)? If you're able to provide an example that we can debug, we will figure out what is going on here

rsatija avatar Dec 08 '23 21:12 rsatija

I thought I was the only one. Two days ago, I updated to v5.0.1 because of the highlighting issue in DimPlot and since then, things that used to run in <3 minutes run for hours. I also started running out of memory when I run PCAs, which had never been an issue before (if I remember correctly, I never needed that much memory). I work with ~100K cells, have 32 GB of RAM, and load counts from disk using BPCells. Here is part of my script (I don't know how helpful it's going to be, but just in case).

library(Seurat) library(SeuratWrappers) options(Seurat.object.assay.version = "v3")

main_dir <- "somedir" setwd(main_dir) sample_dirs <- list.dirs(main_dir, full.names = TRUE, recursive = FALSE) ldat <- list()

for (sample_dir in sample_dirs) { if (grepl("^Pig", basename(sample_dir))) { loom_path <- file.path(sample_dir, "velocyto", paste0(basename(sample_dir), ".loom")) if (file.exists(loom_path)) { bm <- ReadVelocity(file = loom_path) # rownames(bm$spliced)=make.unique(rownames(bm$spliced), sep = "") # rownames(bm$unspliced)=make.unique(rownames(bm$unspliced), sep = "") # rownames(bm$ambiguous)=make.unique(rownames(bm$ambiguous), sep = "_") bm <- as.Seurat(bm) bm[["Sample"]] <- basename(sample_dir) ldat[[basename(sample_dir)]] <- bm rm(bm) } } }

combined_spliced = merge(ldat[[1]], y = ldat[-1])

rm(ldat)

library(BPCells) options(Seurat.object.assay.version = "v5") options(future.globals.maxSize = 1e9)

combined_spliced = UpdateSeuratObject(combined_spliced)

Metadata = [email protected]

dir_spliced <- file.path(getwd(), "spliced_BP") dir_unspliced <- file.path(getwd(), "unspliced_BP") dir_ambiguous <- file.path(getwd(), "ambiguous_BP")

write_matrix_dir(mat = combined_spliced[["spliced"]]$counts, dir = dir_spliced) write_matrix_dir(mat = combined_spliced[["unspliced"]]$counts, dir = dir_unspliced) write_matrix_dir(mat = combined_spliced[["ambiguous"]]$counts, dir = dir_ambiguous)

rm(combined_spliced)

load('Metadata_Combined_BP.Rdata') mat = open_matrix_dir(dir = dir_spliced) Combined_BP = CreateSeuratObject(counts = mat, meta.data = Metadata, assay = "spliced") Combined_BP[["unspliced"]] = CreateAssay5Object(counts = open_matrix_dir(dir = dir_unspliced)) Combined_BP[["ambiguous"]] = CreateAssay5Object(counts = open_matrix_dir(dir = dir_ambiguous))

######### INTEGRATE BASED ON SPLICED ASSAY ###################

Combined_BP[["spliced"]] <- split(Combined_BP[["spliced"]], f = Combined_BP$Sample)

Combined_BP <- NormalizeData(Combined_BP)

Combined_BP <- FindVariableFeatures(Combined_BP, nfeatures = 3000)

Combined_BP <- ScaleData(Combined_BP, features = rownames(Combined_BP))

Combined_BP <- RunPCA(Combined_BP, npcs = 30) <----- CRASH

[...] Continues ...

##############################################################################

NOTE: The session reported below doesn't include the first part of the script. I only loaded the matrices that I had generated on a previous iteration, created the seurat object, and processed it until the RunPCA line, when it crashed after a few minutes.

sessionInfo() R version 4.2.3 (2023-03-15 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] BPCells_0.1.0 Seurat_5.0.1 SeuratObject_5.0.1 sp_2.1-2

loaded via a namespace (and not attached): [1] spam_2.10-0 plyr_1.8.9 igraph_1.6.0
[4] lazyeval_0.2.2 splines_4.2.3 RcppHNSW_0.5.0
[7] BiocParallel_1.32.6 listenv_0.9.0 scattermore_1.2
[10] usethis_2.2.2 GenomeInfoDb_1.34.9 ggplot2_3.4.4
[13] digest_0.6.33 htmltools_0.5.7 fansi_1.0.6
[16] magrittr_2.0.3 memoise_2.0.1 tensor_1.5
[19] cluster_2.1.4 ROCR_1.0-11 remotes_2.4.2.1
[22] globals_0.16.2 Biostrings_2.66.0 matrixStats_1.1.0
[25] spatstat.sparse_3.0-3 colorspace_2.1-0 ggrepel_0.9.4
[28] dplyr_1.1.4 crayon_1.5.2 RCurl_1.98-1.13
[31] jsonlite_1.8.8 spatstat.data_3.0-3 progressr_0.14.0
[34] survival_3.5-3 zoo_1.8-12 glue_1.6.2
[37] polyclip_1.10-6 gtable_0.3.4 zlibbioc_1.44.0
[40] XVector_0.38.0 leiden_0.4.3.1 DelayedArray_0.24.0
[43] pkgbuild_1.4.3 future.apply_1.11.0 BiocGenerics_0.44.0
[46] abind_1.4-5 scales_1.3.0 spatstat.random_3.2-2
[49] miniUI_0.1.1.1 Rcpp_1.0.11 viridisLite_0.4.2
[52] xtable_1.8-4 reticulate_1.34.0 dotCall64_1.1-1
[55] stats4_4.2.3 profvis_0.3.8 htmlwidgets_1.6.4
[58] httr_1.4.7 RColorBrewer_1.1-3 ellipsis_0.3.2
[61] ica_1.0-3 urlchecker_1.0.1 pkgconfig_2.0.3
[64] XML_3.99-0.16 uwot_0.1.16 deldir_2.0-2
[67] utf8_1.2.4 tidyselect_1.2.0 rlang_1.1.2
[70] reshape2_1.4.4 later_1.3.2 munsell_0.5.0
[73] tools_4.2.3 cachem_1.0.8 cli_3.6.2
[76] generics_0.1.3 devtools_2.4.5 ggridges_0.5.4
[79] stringr_1.5.1 fastmap_1.1.1 goftest_1.2-3
[82] yaml_2.3.8 fs_1.6.3 fitdistrplus_1.1-11
[85] purrr_1.0.2 RANN_2.6.1 nlme_3.1-162
[88] pbapply_1.7-2 future_1.33.0 mime_0.12
[91] compiler_4.2.3 rstudioapi_0.15.0 plotly_4.10.3
[94] png_0.1-8 spatstat.utils_3.0-4 tibble_3.2.1
[97] stringi_1.8.2 RSpectra_0.16-1 lattice_0.20-45
[100] Matrix_1.6-3 vctrs_0.6.5 pillar_1.9.0
[103] lifecycle_1.0.4 spatstat.geom_3.2-7 lmtest_0.9-40
[106] RcppAnnoy_0.0.21 data.table_1.14.10 cowplot_1.1.1
[109] bitops_1.0-7 irlba_2.3.5.1 httpuv_1.6.13
[112] patchwork_1.1.3 rtracklayer_1.58.0 GenomicRanges_1.50.2
[115] R6_2.5.1 BiocIO_1.8.0 promises_1.2.1
[118] KernSmooth_2.23-20 gridExtra_2.3 IRanges_2.32.0
[121] parallelly_1.36.0 sessioninfo_1.2.2 codetools_0.2-19
[124] fastDummies_1.7.3 MASS_7.3-58.2 pkgload_1.3.3
[127] SummarizedExperiment_1.28.0 rjson_0.2.21 GenomicAlignments_1.34.1
[130] sctransform_0.4.1 Rsamtools_2.14.0 S4Vectors_0.36.2
[133] GenomeInfoDbData_1.2.9 parallel_4.2.3 grid_4.2.3
[136] tidyr_1.3.0 MatrixGenerics_1.10.0 Rtsne_0.17
[139] spatstat.explore_3.2-5 Biobase_2.58.0 shiny_1.8.0
[142] restfulr_0.0.15

dango147 avatar Dec 11 '23 22:12 dango147

would you be able to share the seurat object where the RunPCA step crashes, or alternately, share the loom file? you can send the link to Seurat Help [email protected], and we will certainly take a look

rsatija avatar Dec 11 '23 23:12 rsatija

would you be able to share the seurat object where the RunPCA step crashes, or alternately, share the loom file? you can send the link to Seurat Help [email protected], and we will certainly take a look

Done

dango147 avatar Dec 12 '23 13:12 dango147

Sorry, this could be helpful to someone. I just ran the same code in our server, which still had the beta v5 installed (v4.9.9.9060), and it completed the RunPCA step using ~8 GB of RAM, so I don't know what happened after I updated Seurat on my laptop. When I had the Beta version, it used to work even better than our server.

UPDATE: I know it's not the right way to do it, but it's the only way I could come up with. As I couldn't find a way to re-install the beta v5 version on my laptop, I compressed the Seurat and SeuratObject libraries I had in the server and installed them on my laptop. Then, I downgraded Matrix to v1.6.1, and now everything is working as it used to.

dango147 avatar Dec 12 '23 19:12 dango147

well, I need to move forward with my project so I need downgrading for the moment, Could you be so kind, @dango147, to share the older Seurat and SeuratObject libraries that are working fine for you? also any hints for successfully downgrading to Matrix v1.6.1?

Dario-Rocha avatar Dec 13 '23 14:12 Dario-Rocha

well, I need to move forward with my project so I need downgrading for the moment, Could you be so kind, @dango147, to share the older Seurat and SeuratObject libraries that are working fine for you? also any hints for successfully downgrading to Matrix v1.6.1?

Sure. First, I removed the Seurat, SeuratObject and Matrix libraries from the Rstudio package menu. Then, I restarted R and ran "devtools::install_version("Matrix",version = "1.6.1")". Once Matrix was installed, I manually installed the two Seurat packages from these two zip files

SeuratObject.zip Seurat.zip

I don't know how helpful this will be as we could be using completely different environments, but I hope it helps! There is nothing else I can do

dango147 avatar Dec 13 '23 14:12 dango147

Somehow can't downgrade the Matrix package, also I am on a mac and the binaries you kindly provided are for windows, thank you a lot for the effort anyway! hopefully we can get some kind of quick and temporary official solution

Dario-Rocha avatar Dec 14 '23 08:12 Dario-Rocha

Is there any way to access the beta releases still? I have a project with 1.36 million cells and the "Calculating Leverage Score" step hasn't finished after letting it run for multiple days. Any way to speed this up would be much appreciated

jcorn427 avatar Dec 19 '23 16:12 jcorn427

Hi All, After updating to Seurat v5 a few weeks ago I am experiencing the issue described here when running Seurat functions within RStudio on an M2 Ultra processor machine (a Mac Studio). I don't believe the future parallelization is being implemented when I call it for "multisession" or "multicore" status on my 24 cpus. Has this been resolved? Is there an update to Seurat (or version of future or RStudio?) that I should look for? I might have to try downgrading.

sknaack avatar Jan 31 '24 02:01 sknaack

Curious also if the {future} parallelization applies to Seurat v5, and what sort of performance gains we should expect. The v4.3 documentation does not apply exactly--e.g. can enable multisession but not multiprocess--and I'm not clear it's actually implemented in v5.

Currently working with a modest dataset of ~10K features across ~180K cells, and iterating over the workflow testing different parameters is awfully slow.

jtourig avatar Feb 23 '24 21:02 jtourig

In case it's helpful to note for anyone: I've had much better luck running Seurat v5 processes outside of Rstudio, i.e., from command line scripts given as arguments to Rscript and generally using a multicore plan() in future. Pretty much same code and libraries, but much better processing time. I suspect this is down to hitches using future in Rstudio on OSX. For now I'm doing well via command-line. If anyone has advice for how to best set up future (multicore vs multisession vs multiprocessor) in R as run on M-class processors in OS X I'd be curious.

sknaack avatar Feb 23 '24 21:02 sknaack

Does anybody have any updates on this? I'm using Seurat 5.1.0, R 4.3.2 and SketchData is running for 4 days straight now. I have an extensive dataset of 250+ samples, but the cell number should be manageable (1.4 million). Running directly in R (not RStudio) and using Future's multicore implementation did not improve this, although it does not seem like SketchData does even utilize multiple cores since I only see a single R process running. Any help would be appreciated!

philjurm avatar May 21 '24 18:05 philjurm

I've just tried after updating to seurat 5.1.0 and the issue is still the same on a 1.3 million cells and 92 samples dataset.

Dario-Rocha avatar May 27 '24 13:05 Dario-Rocha

I have this same issue and it seems like it's this line that causes the slow down, about ~2 hours per layer in my data. try( expr = VariableFeatures(object = sketched, method = "sketch", layer = lyr) <- VariableFeatures(object = object[[assay]], layer = lyr), silent = FALSE )

jfwhalen avatar Jun 04 '24 14:06 jfwhalen

normalizedata slow findvariable feature slow scaledata slow runpca slow sketch data slow (run sketchdata before i sleep and find it stilling running after i wake up.) everything get slow after bpcell and a large data set

Pentayouth avatar Aug 28 '24 01:08 Pentayouth