bioc-rnaseq
bioc-rnaseq copied to clipboard
Reordering SE columns seems to freeze R sessions in workshops
At today’s EuroBioC workshop (and also in Ethiopia last month), the line:
se <- se[, order(se$Group)]
(in https://github.com/carpentries-incubator/bioc-rnaseq/blob/main/episodes/03-import-annotate.Rmd#L254_)
caused a bunch of laptops to hang. People had to restart R sessions to continue.
cc @js2264
I believe this could originate from the fact that I ran :
se <- SummarizedExperiment(assays = list("counts" = counts), rowRanges = as(rowranges, "GRanges"), colData = coldata)
## Rather than (note the `as.matrix()`):
se <- SummarizedExperiment(assays = list("counts" = as.matrix(counts)), rowRanges = as(rowranges, "GRanges"), colData = coldata)
But Vasileios also reported a slow-down even though his assay is a matrix as it should be.
This is relatively non-reproducible. The reordering with r se[, order(se$Group)] sometimes was instantaneous, sometimes it took 5 min (same R instance on a local M1 MacBookPro, repeating the command twice). I tried to repeat it but it did not happen again after I terminated RStudio.
> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Madrid
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SummarizedExperiment_1.39.1 GenomicRanges_1.61.2 Seqinfo_0.99.2 MatrixGenerics_1.21.0
[5] matrixStats_1.5.0 hgu95av2.db_3.13.0 org.Hs.eg.db_3.21.0 org.Mm.eg.db_3.21.0
[9] AnnotationDbi_1.71.1 IRanges_2.43.0 S4Vectors_0.47.0 Biobase_2.69.0
[13] BiocGenerics_0.55.1 generics_0.1.4
loaded via a namespace (and not attached):
[1] Matrix_1.7-4 bit_4.6.0 compiler_4.5.1 crayon_1.5.3 blob_1.2.4 Biostrings_2.77.2
[7] png_0.1-8 yaml_2.3.10 fastmap_1.2.0 lattice_0.22-7 R6_2.6.1 XVector_0.49.0
[13] S4Arrays_1.9.1 knitr_1.50 DelayedArray_0.35.2 DBI_1.2.3 rlang_1.1.6 KEGGREST_1.49.1
[19] cachem_1.1.0 xfun_0.53 bit64_4.6.0-1 SparseArray_1.9.1 RSQLite_2.4.3 memoise_2.0.1
[25] cli_3.6.5 grid_4.5.1 digest_0.6.37 rstudioapi_0.17.1 vctrs_0.6.5 evaluate_1.0.5
[31] abind_1.4-8 rmarkdown_2.29 httr_1.4.7 tools_4.5.1 pkgconfig_2.0.3 htmltools_0.5.8.1
Regardless of whether this is linked to matrix coercion, I think it would be nice to first convert counts into a matrix, THEN build our SE. This would contribute to clearly differentiate that assays store (generally..) quantitative experimental measurements (in numerical matrices), while col/rowData store qualitative AND/OR quantitative data (in data frames).
counts <- as.matrix(read.csv("data/GSE96870_counts_cerebellum.csv", row.names = 1))
and then, when building the SE:
se <- SummarizedExperiment(
assays = list(counts = counts),
rowRanges = as(rowranges, "GRanges"),
colData = coldata
)
That is very odd, especially if it's non-reproducible but did happen in the last two workshops. The bioc-intro also has some subsetting of a SummarizedExperiment object - has it ever happened in that one?
Not as far as I know.
I do tend to coerce the data.frame into a matrix right at the beginning with
counts <- read.csv("data/GSE96870_counts_cerebellum.csv", row.names = 1) |>
as.matrix()
which is also a reminder of the |>, so that the learners don't compartmentalise pipe and tidyverse vs SEs.
Testing locally with the session below, and I don't see any problem:
> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.3 LTS
Matrix products: default
BLAS: /opt/R-4.5/lib/R/lib/libRblas.so
LAPACK: /opt/R-4.5/lib/R/lib/libRlapack.so; LAPACK version 3.12.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Brussels
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] hgu95av2.db_3.13.0 org.Hs.eg.db_3.21.0 org.Mm.eg.db_3.21.0
[4] AnnotationDbi_1.70.0 scpdata_1.16.1 ExperimentHub_2.16.0
[7] AnnotationHub_3.16.0 BiocFileCache_2.16.0 dbplyr_2.5.0
[10] QFeatures_1.19.1 MultiAssayExperiment_1.34.0 SummarizedExperiment_1.38.1
[13] Biobase_2.68.0 GenomicRanges_1.60.0 GenomeInfoDb_1.44.0
[16] IRanges_2.42.0 S4Vectors_0.46.0 BiocGenerics_0.54.0
[19] generics_0.1.4 MatrixGenerics_1.20.0 matrixStats_1.5.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.48.0 lattice_0.22-6 vctrs_0.6.5
[4] tools_4.5.0 curl_6.4.0 tibble_3.3.0
[7] RSQLite_2.3.11 cluster_2.1.8.1 blob_1.2.4
[10] pkgconfig_2.0.3 BiocBaseUtils_1.10.0 Matrix_1.7-3
[13] lifecycle_1.0.4 GenomeInfoDbData_1.2.14 compiler_4.5.0
[16] stringr_1.5.1 Biostrings_2.76.0 clue_0.3-66
[19] yaml_2.3.10 lazyeval_0.2.2 pillar_1.11.0
[22] crayon_1.5.3 tidyr_1.3.1 MASS_7.3-65
[25] SingleCellExperiment_1.30.1 DelayedArray_0.34.1 cachem_1.1.0
[28] abind_1.4-8 mime_0.13 tidyselect_1.2.1
[31] stringi_1.8.7 BiocVersion_3.21.1 dplyr_1.1.4
[34] reshape2_1.4.4 purrr_1.0.4 fastmap_1.2.0
[37] grid_4.5.0 cli_3.6.5 SparseArray_1.8.0
[40] magrittr_2.0.3 S4Arrays_1.8.0 withr_3.0.2
[43] rappdirs_0.3.3 filelock_1.0.3 UCSC.utils_1.4.0
[46] bit64_4.6.0-1 XVector_0.48.0 httr_1.4.7
[49] igraph_2.1.4 bit_4.6.0 png_0.1-8
[52] memoise_2.0.1 rlang_1.1.6 Rcpp_1.1.0
[55] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.25
[58] jsonlite_2.0.0 AnnotationFilter_1.32.0 R6_2.6.1
[61] plyr_1.8.9 ProtGenerics_1.40.0 MsCoreUtils_1.20.0
Following up from a discussion with @ivanek, the issue isn't related to R/package versions, as he and others have observed the issue intermittently on the same computer, without any obvious pattern that would help reproducing the issue. The previously run commands don't seem to be the case of the crash either.
If anyone has time, they could compare execution in RStudio and a bare R console.
Also the issue has happened for some of the trainees but not all. Not sure whether it's all from Mac machines (mine and Vasilis were Mac).
On my side, I've never had this issue in the past, using a bare R console or radian. It happened to me for the first time for this workshop, which was all on Rstudio.
I don't have a precise timing, but the machine would lag (without increasing RAM/CPU usage though) for a solid 2 minutes, then eventually successfully release the prompt.
Similar things have happened to us recently (in RStudio, not with this particular example data), and if I remember correctly it was solved by turning off the automatic code completion (Tools -> Global options -> Code -> Completion -> set Show code completions Manually).
FYI, happened again during Physalia scRNAseq workshop with @almeidasilvaf
We were using a rstudio-server instance running an AWS machine. @almeidasilvaf suggested that maybe this issue was related to rstudio-server specifically, since it did not ever happen when doing local analyses.