bioc-rnaseq icon indicating copy to clipboard operation
bioc-rnaseq copied to clipboard

Reordering SE columns seems to freeze R sessions in workshops

Open mblue9 opened this issue 2 months ago • 9 comments

At today’s EuroBioC workshop (and also in Ethiopia last month), the line:

se <- se[, order(se$Group)]

(in https://github.com/carpentries-incubator/bioc-rnaseq/blob/main/episodes/03-import-annotate.Rmd#L254_)

caused a bunch of laptops to hang. People had to restart R sessions to continue.

cc @js2264

mblue9 avatar Sep 15 '25 12:09 mblue9

I believe this could originate from the fact that I ran :

se <- SummarizedExperiment(assays = list("counts" = counts), rowRanges = as(rowranges, "GRanges"), colData = coldata)

## Rather than (note the `as.matrix()`): 

se <- SummarizedExperiment(assays = list("counts" = as.matrix(counts)), rowRanges = as(rowranges, "GRanges"), colData = coldata)

But Vasileios also reported a slow-down even though his assay is a matrix as it should be.

This is relatively non-reproducible. The reordering with r se[, order(se$Group)] sometimes was instantaneous, sometimes it took 5 min (same R instance on a local M1 MacBookPro, repeating the command twice). I tried to repeat it but it did not happen again after I terminated RStudio.

> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SummarizedExperiment_1.39.1 GenomicRanges_1.61.2        Seqinfo_0.99.2              MatrixGenerics_1.21.0      
 [5] matrixStats_1.5.0           hgu95av2.db_3.13.0          org.Hs.eg.db_3.21.0         org.Mm.eg.db_3.21.0        
 [9] AnnotationDbi_1.71.1        IRanges_2.43.0              S4Vectors_0.47.0            Biobase_2.69.0             
[13] BiocGenerics_0.55.1         generics_0.1.4             

loaded via a namespace (and not attached):
 [1] Matrix_1.7-4        bit_4.6.0           compiler_4.5.1      crayon_1.5.3        blob_1.2.4          Biostrings_2.77.2  
 [7] png_0.1-8           yaml_2.3.10         fastmap_1.2.0       lattice_0.22-7      R6_2.6.1            XVector_0.49.0     
[13] S4Arrays_1.9.1      knitr_1.50          DelayedArray_0.35.2 DBI_1.2.3           rlang_1.1.6         KEGGREST_1.49.1    
[19] cachem_1.1.0        xfun_0.53           bit64_4.6.0-1       SparseArray_1.9.1   RSQLite_2.4.3       memoise_2.0.1      
[25] cli_3.6.5           grid_4.5.1          digest_0.6.37       rstudioapi_0.17.1   vctrs_0.6.5         evaluate_1.0.5     
[31] abind_1.4-8         rmarkdown_2.29      httr_1.4.7          tools_4.5.1         pkgconfig_2.0.3     htmltools_0.5.8.1  

js2264 avatar Sep 15 '25 14:09 js2264

Regardless of whether this is linked to matrix coercion, I think it would be nice to first convert counts into a matrix, THEN build our SE. This would contribute to clearly differentiate that assays store (generally..) quantitative experimental measurements (in numerical matrices), while col/rowData store qualitative AND/OR quantitative data (in data frames).

counts <- as.matrix(read.csv("data/GSE96870_counts_cerebellum.csv", row.names = 1))

and then, when building the SE:

se <- SummarizedExperiment(
    assays = list(counts = counts),
    rowRanges = as(rowranges, "GRanges"),
    colData = coldata
)

js2264 avatar Sep 15 '25 14:09 js2264

That is very odd, especially if it's non-reproducible but did happen in the last two workshops. The bioc-intro also has some subsetting of a SummarizedExperiment object - has it ever happened in that one?

jdrnevich avatar Sep 15 '25 18:09 jdrnevich

Not as far as I know.

I do tend to coerce the data.frame into a matrix right at the beginning with

counts <- read.csv("data/GSE96870_counts_cerebellum.csv", row.names = 1) |>
     as.matrix()

which is also a reminder of the |>, so that the learners don't compartmentalise pipe and tidyverse vs SEs.

lgatto avatar Sep 15 '25 20:09 lgatto

Testing locally with the session below, and I don't see any problem:

> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.3 LTS

Matrix products: default
BLAS:   /opt/R-4.5/lib/R/lib/libRblas.so 
LAPACK: /opt/R-4.5/lib/R/lib/libRlapack.so;  LAPACK version 3.12.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Brussels
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hgu95av2.db_3.13.0          org.Hs.eg.db_3.21.0         org.Mm.eg.db_3.21.0        
 [4] AnnotationDbi_1.70.0        scpdata_1.16.1              ExperimentHub_2.16.0       
 [7] AnnotationHub_3.16.0        BiocFileCache_2.16.0        dbplyr_2.5.0               
[10] QFeatures_1.19.1            MultiAssayExperiment_1.34.0 SummarizedExperiment_1.38.1
[13] Biobase_2.68.0              GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
[16] IRanges_2.42.0              S4Vectors_0.46.0            BiocGenerics_0.54.0        
[19] generics_0.1.4              MatrixGenerics_1.20.0       matrixStats_1.5.0          

loaded via a namespace (and not attached):
 [1] KEGGREST_1.48.0             lattice_0.22-6              vctrs_0.6.5                
 [4] tools_4.5.0                 curl_6.4.0                  tibble_3.3.0               
 [7] RSQLite_2.3.11              cluster_2.1.8.1             blob_1.2.4                 
[10] pkgconfig_2.0.3             BiocBaseUtils_1.10.0        Matrix_1.7-3               
[13] lifecycle_1.0.4             GenomeInfoDbData_1.2.14     compiler_4.5.0             
[16] stringr_1.5.1               Biostrings_2.76.0           clue_0.3-66                
[19] yaml_2.3.10                 lazyeval_0.2.2              pillar_1.11.0              
[22] crayon_1.5.3                tidyr_1.3.1                 MASS_7.3-65                
[25] SingleCellExperiment_1.30.1 DelayedArray_0.34.1         cachem_1.1.0               
[28] abind_1.4-8                 mime_0.13                   tidyselect_1.2.1           
[31] stringi_1.8.7               BiocVersion_3.21.1          dplyr_1.1.4                
[34] reshape2_1.4.4              purrr_1.0.4                 fastmap_1.2.0              
[37] grid_4.5.0                  cli_3.6.5                   SparseArray_1.8.0          
[40] magrittr_2.0.3              S4Arrays_1.8.0              withr_3.0.2                
[43] rappdirs_0.3.3              filelock_1.0.3              UCSC.utils_1.4.0           
[46] bit64_4.6.0-1               XVector_0.48.0              httr_1.4.7                 
[49] igraph_2.1.4                bit_4.6.0                   png_0.1-8                  
[52] memoise_2.0.1               rlang_1.1.6                 Rcpp_1.1.0                 
[55] glue_1.8.0                  DBI_1.2.3                   BiocManager_1.30.25        
[58] jsonlite_2.0.0              AnnotationFilter_1.32.0     R6_2.6.1                   
[61] plyr_1.8.9                  ProtGenerics_1.40.0         MsCoreUtils_1.20.0         

lgatto avatar Sep 16 '25 13:09 lgatto

Following up from a discussion with @ivanek, the issue isn't related to R/package versions, as he and others have observed the issue intermittently on the same computer, without any obvious pattern that would help reproducing the issue. The previously run commands don't seem to be the case of the crash either.

If anyone has time, they could compare execution in RStudio and a bare R console.

lgatto avatar Sep 17 '25 09:09 lgatto

Also the issue has happened for some of the trainees but not all. Not sure whether it's all from Mac machines (mine and Vasilis were Mac).

On my side, I've never had this issue in the past, using a bare R console or radian. It happened to me for the first time for this workshop, which was all on Rstudio.

I don't have a precise timing, but the machine would lag (without increasing RAM/CPU usage though) for a solid 2 minutes, then eventually successfully release the prompt.

js2264 avatar Sep 17 '25 10:09 js2264

Similar things have happened to us recently (in RStudio, not with this particular example data), and if I remember correctly it was solved by turning off the automatic code completion (Tools -> Global options -> Code -> Completion -> set Show code completions Manually).

csoneson avatar Sep 17 '25 23:09 csoneson

FYI, happened again during Physalia scRNAseq workshop with @almeidasilvaf

We were using a rstudio-server instance running an AWS machine. @almeidasilvaf suggested that maybe this issue was related to rstudio-server specifically, since it did not ever happen when doing local analyses.

js2264 avatar Nov 19 '25 12:11 js2264