tidySingleCellExperiment
tidySingleCellExperiment copied to clipboard
aggregate_cells takes too long
Dear Dr. Mangiola,
Thank you for the very nice package. I am working with large scale single cell RNA seq data and wnat to use tidySingleCellExperiment.
I discovered that aggregate_cells
takes very long, as compared to aggregateAcrossCells
.
As I am usually working on a server, I recreated the problem with a 225k cell dataset on my laptop: https://cellxgene.cziscience.com/e/dea717d4-7bc0-4e46-950f-fd7e1cc8df7d.cxg/
require(tidySingleCellExperiment)
require(tidySummarizedExperiment)
#setwd("/Users/maximiliannuber/Documents/CSAMA_2024")
sce <- readr::read_rds("Seurat_kidney.rds")
sce <- as.SingleCellExperiment(sce)
aggregateAcrossCells
runs fast:
system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))
user system elapsed
11.690 2.481 16.056
This code ran very long and I interrupted after about 10 minutes.
system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))
I looked at this with Michael Love, and we found this may be an issue with the combination of donor and cell type. This code took just a few seconds:
system.time(
pbulk <- sce %>%
aggregate_cells(cell_type, assays="counts")
)
user system elapsed
10.164 2.333 13.953
Thank you for any help!
output of sessionInfo:
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.2.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Rome
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidySummarizedExperiment_1.14.0 ttservice_0.4.1
[3] tidyr_1.3.1 tidySingleCellExperiment_1.14.0
[5] muscData_1.18.0 ExperimentHub_2.12.0
[7] AnnotationHub_3.12.0 BiocFileCache_2.12.0
[9] dbplyr_2.5.0 rpx_2.12.0
[11] edgeR_4.2.0 stringr_1.5.1
[13] pheatmap_1.0.12 celldex_1.14.0
[15] SingleR_2.6.0 igraph_2.0.3
[17] GGally_2.2.1 NewWave_1.14.0
[19] scry_1.16.0 scDblFinder_1.18.0
[21] scran_1.32.0 scater_1.32.0
[23] ggplot2_3.5.1 EnsDb.Hsapiens.v86_2.99.0
[25] ensembldb_2.28.0 AnnotationFilter_1.28.0
[27] GenomicFeatures_1.56.0 AnnotationDbi_1.66.0
[29] scuttle_1.14.0 DropletUtils_1.24.0
[31] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
[33] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
[35] IRanges_2.38.0 S4Vectors_0.42.0
[37] MatrixGenerics_1.16.0 matrixStats_1.3.0
[39] DropletTestFiles_1.14.0 dplyr_1.1.4
[41] limma_3.60.3 RcppSpdlog_0.0.17
[43] Seurat_5.0.3 cellxgene.census_1.14.1
[45] SeuratObject_5.0.1 sp_2.1-4
[47] GEOquery_2.72.0 Biobase_2.64.0
[49] BiocGenerics_0.50.0
loaded via a namespace (and not attached):
[1] R.methodsS3_1.8.2 vroom_1.6.5 RcppCCTZ_0.2.12
[4] spdl_0.0.5 goftest_1.2-3 Biostrings_2.72.1
[7] HDF5Array_1.32.0 vctrs_0.6.5 spatstat.random_3.2-3
[10] digest_0.6.35 png_0.1-8 aws.signature_0.6.0
[13] gypsum_1.0.1 tiledb_0.27.0 ggrepel_0.9.5
[16] deldir_2.0-4 parallelly_1.37.1 MASS_7.3-60.2
[19] reshape2_1.4.4 httpuv_1.6.15 withr_3.0.0
[22] xfun_0.43 aws.s3_0.3.21 ellipsis_0.3.2
[25] survival_3.5-8 memoise_2.0.1 ggbeeswarm_0.7.2
[28] zoo_1.8-12 pbapply_1.7-2 R.oo_1.26.0
[31] KEGGREST_1.44.1 promises_1.3.0 httr_1.4.7
[34] restfulr_0.0.15 globals_0.16.3 fitdistrplus_1.1-11
[37] rhdf5filters_1.16.0 ps_1.7.6 rhdf5_2.48.0
[40] rstudioapi_0.16.0 nanotime_0.3.7 UCSC.utils_1.0.0
[43] miniUI_0.1.1.1 generics_0.1.3 processx_3.8.4
[46] base64enc_0.1-3 curl_5.2.1 zlibbioc_1.50.0
[49] ScaledMatrix_1.12.0 polyclip_1.10-6 glmpca_0.2.0
[52] GenomeInfoDbData_1.2.12 SparseArray_1.4.3 desc_1.4.3
[55] xtable_1.8-4 evaluate_0.23 S4Arrays_1.4.0
[58] hms_1.1.3 irlba_2.3.5.1 colorspace_2.1-0
[61] filelock_1.0.3 ROCR_1.0-11 reticulate_1.36.1
[64] spatstat.data_3.0-4 magrittr_2.0.3 lmtest_0.9-40
[67] readr_2.1.5 nanoarrow_0.4.0.1 later_1.3.2
[70] viridis_0.6.5 lattice_0.22-6 spatstat.geom_3.2-9
[73] future.apply_1.11.2 scattermore_1.2 XML_3.99-0.16.1
[76] triebeard_0.4.1 cowplot_1.1.3 RcppAnnoy_0.0.22
[79] pillar_1.9.0 nlme_3.1-164 sna_2.7-2
[82] compiler_4.4.0 beachmat_2.20.0 RSpectra_0.16-1
[85] stringi_1.8.3 tensor_1.5 GenomicAlignments_1.40.0
[88] plyr_1.8.9 crayon_1.5.2 abind_1.4-5
[91] BiocIO_1.14.0 locfit_1.5-9.9 bit_4.0.5
[94] codetools_0.2-20 BiocSingular_1.20.0 alabaster.ranges_1.4.1
[97] plotly_4.10.4 mime_0.12 intergraph_2.0-4
[100] splines_4.4.0 Rcpp_1.0.12 fastDummies_1.7.3
[103] sparseMatrixStats_1.16.0 knitr_1.46 blob_1.2.4
[106] utf8_1.2.4 BiocVersion_3.19.1 fs_1.6.4
[109] listenv_0.9.1 DelayedMatrixStats_1.26.0 pkgbuild_1.4.4
[112] tibble_3.2.1 Matrix_1.7-0 callr_3.7.6
[115] statmod_1.5.0 tzdb_0.4.0 network_1.18.2
[118] pkgconfig_2.0.3 tools_4.4.0 cachem_1.0.8
[121] RSQLite_2.3.7 viridisLite_0.4.2 DBI_1.2.2
[124] fastmap_1.1.1 rmarkdown_2.26 scales_1.3.0
[127] grid_4.4.0 ica_1.0-3 Rsamtools_2.20.0
[130] coda_0.19-4.1 patchwork_1.2.0 ggstats_0.6.0
[133] BiocManager_1.30.23 dotCall64_1.1-1 alabaster.schemas_1.4.0
[136] RANN_2.6.1 farver_2.1.1 yaml_2.3.8
[139] rtracklayer_1.64.0 cli_3.6.2 purrr_1.0.2
[142] leiden_0.4.3.1 lifecycle_1.0.4 uwot_0.2.2
[145] arrow_16.1.0 bluster_1.14.0 BiocParallel_1.38.0
[148] gtable_0.3.5 rjson_0.2.21 ggridges_0.5.6
[151] progressr_0.14.0 parallel_4.4.0 jsonlite_1.8.8
[154] RcppHNSW_0.6.0 bitops_1.0-7 bit64_4.0.5
[157] assertthat_0.2.1 xgboost_1.7.7.1 Rtsne_0.17
[160] alabaster.matrix_1.4.1 spatstat.utils_3.0-4 BiocNeighbors_1.22.0
[163] urltools_1.7.3 alabaster.se_1.4.1 metapod_1.12.0
[166] dqrng_0.3.2 R.utils_2.12.3 alabaster.base_1.4.1
[169] lazyeval_0.2.2 shiny_1.8.1.1 htmltools_0.5.8.1
[172] sctransform_0.4.1 rappdirs_0.3.3 glue_1.7.0
[175] spam_2.10-0 httr2_1.0.1 XVector_0.44.0
[178] RCurl_1.98-1.14 gridExtra_2.3 tiledbsoma_1.11.1
[181] R6_2.5.1 DESeq2_1.44.0 labeling_0.4.3
[184] SharedObject_1.18.0 cluster_2.1.6 pkgload_1.3.4
[187] Rhdf5lib_1.26.0 statnet.common_4.9.0 DelayedArray_0.30.1
[190] tidyselect_1.2.1 vipor_0.4.7 ProtGenerics_1.36.0
[193] xml2_1.3.6 future_1.33.2 rsvd_1.0.5
[196] munsell_0.5.1 KernSmooth_2.23-22 data.table_1.15.4
[199] htmlwidgets_1.6.4 RColorBrewer_1.1-3 rlang_1.1.3
[202] spatstat.sparse_3.0-3 spatstat.explore_3.2-7 remotes_2.5.0
[205] fansi_1.0.6 beeswarm_0.4.0
Thanks!