DEGreport icon indicating copy to clipboard operation
DEGreport copied to clipboard

degPatterns crashes because of stack overloading

Open NastiaSkuba opened this issue 1 year ago • 7 comments

Hello!

I am doing the RNAseq data analysis. When I try to run degPatterns with deferentially expressed genes. The plot with all clusters is visualized, but afterwards I got an error:

C stack usage 7969268 is too close to the limit

and the object was not saved.

There are 3295 DEGs in my dataset. I tried the function with smaller number of genes and it worked till 2000 genes. But for my project all 3295 genes are necessary and I am already using the p < 0.01. I did the same analysis with same number of genes 1 month ago, and everything worked.

Below are the example of my code, output and session info:

matrix <- assay(dds)[genes_significant$gene_ID, ]
matrix <- varianceStabilizingTransformation(matrix)
gene_clusters <- degPatterns(matrix, metadata = metadata_rna, time = "stage")
A large number of genes was given-- please, make sure this is not an error. Normally, only DE genes will be useful for this function.
Working with 3295 genes.
Working with 3281 genes after filtering: minc > 15
Joining with `by = join_by(merge)`Joining with `by = join_by(merge)`Error: C stack usage  7969348 is too close to the limit
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEGreport_1.38.5            lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1              
 [5] dplyr_1.1.4                 purrr_1.0.2                 readr_2.1.4                 tidyr_1.3.0                
 [9] tibble_3.2.1                ggplot2_3.4.4               tidyverse_2.0.0             DESeq2_1.42.0              
[13] SummarizedExperiment_1.32.0 Biobase_2.62.0              MatrixGenerics_1.14.0       matrixStats_1.2.0          
[17] GenomicRanges_1.54.1        GenomeInfoDb_1.38.1         IRanges_2.36.0              S4Vectors_0.40.2           
[21] BiocGenerics_0.48.1        

loaded via a namespace (and not attached):
 [1] mnormt_2.1.1                bitops_1.0-7                gridExtra_2.3               rlang_1.1.2                
 [5] magrittr_2.0.3              clue_0.3-65                 GetoptLong_1.0.5            compiler_4.3.1             
 [9] mgcv_1.9-0                  png_0.1-8                   vctrs_0.6.5                 pkgconfig_2.0.3            
[13] shape_1.4.6                 crayon_1.5.2                fastmap_1.1.1               backports_1.4.1            
[17] XVector_0.42.0              labeling_0.4.3              utf8_1.2.4                  rmarkdown_2.25             
[21] tzdb_0.4.0                  xfun_0.41                   zlibbioc_1.48.0             jsonlite_1.8.8             
[25] reshape_0.8.9               DelayedArray_0.28.0         BiocParallel_1.36.0         psych_2.3.9                
[29] broom_1.0.5                 parallel_4.3.1              cluster_2.1.4               R6_2.5.1                   
[33] stringi_1.8.3               RColorBrewer_1.1-3          limma_3.58.1                Rcpp_1.0.11                
[37] iterators_1.0.14            knitr_1.45                  splines_4.3.1               Matrix_1.5-4.1             
[41] timechange_0.2.0            tidyselect_1.2.0            rstudioapi_0.15.0           abind_1.4-5                
[45] yaml_2.3.8                  viridis_0.6.4               doParallel_1.0.17           codetools_0.2-19           
[49] plyr_1.8.9                  lattice_0.21-8              withr_2.5.2                 evaluate_0.23              
[53] ConsensusClusterPlus_1.66.0 circlize_0.4.15             BiocManager_1.30.22         pillar_1.9.0               
[57] foreach_1.5.2               plotly_4.10.3               generics_0.1.3              RCurl_1.98-1.13            
[61] hms_1.1.3                   munsell_0.5.0               scales_1.3.0                glue_1.6.2                 
[65] lazyeval_0.2.2              tools_4.3.1                 dendextend_1.17.1           data.table_1.14.10         
[69] locfit_1.5-9.8              cowplot_1.1.1               grid_4.3.1                  edgeR_4.0.3                
[73] colorspace_2.1-0            nlme_3.1-162                GenomeInfoDbData_1.2.11     cli_3.6.2                  
[77] fansi_1.0.6                 S4Arrays_1.2.0              viridisLite_0.4.2           ggdendro_0.1.23            
[81] ComplexHeatmap_2.18.0       gtable_0.3.4                logging_0.10-108            digest_0.6.33              
[85] SparseArray_1.2.2           ggrepel_0.9.4               farver_2.1.1                rjson_0.2.21               
[89] htmlwidgets_1.6.4           htmltools_0.5.7             lifecycle_1.0.4             httr_1.4.7                 
[93] GlobalOptions_0.1.2         statmod_1.5.0               MASS_7.3-60     

Update: the error was caused by dendrogram functionality. When this option was deleted, everything worked. I kindly ask developers to add parameter disabling dendrogram plot.

NastiaSkuba avatar Dec 14 '23 12:12 NastiaSkuba

Hi, Thank you for the debugging. I will make a note and try to do this in the next two weeks, if you find time for a PR, I will merge as well. Just to make sure, is this line what you found to be the issue?

lpantano avatar Jan 04 '24 17:01 lpantano

Hi, I am facing the same problem with degPatterns() with the number of genes input ranging from 1000-4000. I've used this function multiple times in the past without a glitch for more than 6000 genes, generating the object, cluster image and, exporting the group-wise gene list. Temporarily I've been managing by running the following code at bash before opening R from the same terminal ulimit -s unlimited. With this, the object gets created and I can export the group-wise gene lists but I cannot save the cluster image since it gets replaced by the dendrogram. Has this issue been resolved yet or is there a way to get around this? Thanks so much!

pdecodes avatar Jan 11 '24 08:01 pdecodes

Hi, I am still trying to figure out the issue. It is hard because I cannot reproduce in MACOSX. I will try to remove the dendrogram code and see if that helps.

lpantano avatar Jan 11 '24 15:01 lpantano

I have pushed an temporary fix to avoid using dendextend in the main branch of this repo. Could you test loading the code from this branch and see if the errors go away?

lpantano avatar Jan 11 '24 19:01 lpantano

Hello,

I am currently on R 4.3.2, DEGReport v 1.38.5, and macOS 14.4 and I am still having this issue. As noted by previous commentators, the error appears when a large number of genes are submitted for clustering (in my case, I would get this error "C stack usage 7969268 is too close to the limit" when the number of genes was over 5000). I just reduced the number of genes to 4300 by using a more stringent p-value, and I was able to successfully cluster them. However, it would be cool if we could cluster without having to make alterations to our thresholds.

Thanks for looking into this!

juanb001 avatar Mar 13 '24 01:03 juanb001

Can you install version: 1.39.6? Right now it is GitHub main, so you will need to install it with devtools? If that works I will merge it with bioconductor. thanks!

lpantano avatar Mar 13 '24 14:03 lpantano

Hi!

I just tried out v 1.39.6 and had no issues clustering a large number of genes. Thank you for the quick fix!

juanb001 avatar Mar 17 '24 02:03 juanb001

Hi,

I face the same issue while using R (v4.4.0), Bioc (v3.19), and DEGreport (v1.40.0).

Working with 2776 genes.
Working with 2762 genes after filtering: minc > 15
Joining with `by = join_by(merge)`
Joining with `by = join_by(merge)`
Error: C stack usage  7969780 is too close to the limit

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1              
 [4] dplyr_1.1.4                 purrr_1.0.2                 readr_2.1.5                
 [7] tidyr_1.3.1                 tibble_3.2.1                tidyverse_2.0.0            
[10] viridis_0.6.5               viridisLite_0.4.2           ComplexHeatmap_2.20.0      
[13] pheatmap_1.0.12             GenomicFeatures_1.56.0      AnnotationDbi_1.66.0       
[16] rtracklayer_1.64.0          openxlsx_4.2.5.2            ggrepel_0.9.5              
[19] ggplot2_3.5.1               gplots_3.1.3.1              RColorBrewer_1.1-3         
[22] DEGreport_1.40.0            DESeq2_1.44.0               SummarizedExperiment_1.34.0
[25] Biobase_2.64.0              MatrixGenerics_1.16.0       matrixStats_1.3.0          
[28] GenomicRanges_1.56.1        GenomeInfoDb_1.40.1         IRanges_2.38.0             
[31] S4Vectors_0.42.0            BiocGenerics_0.50.0        

loaded via a namespace (and not attached):
  [1] ggdendro_0.2.0              rstudioapi_0.16.0           jsonlite_1.8.8             
  [4] shape_1.4.6.1               magrittr_2.0.3              farver_2.1.2               
  [7] rmarkdown_2.27              GlobalOptions_0.1.2         BiocIO_1.14.0              
 [10] zlibbioc_1.50.0             vctrs_0.6.5                 memoise_2.0.1              
 [13] Rsamtools_2.20.0            RCurl_1.98-1.14             progress_1.2.3             
 [16] htmltools_0.5.8.1           S4Arrays_1.4.1              curl_5.2.1                 
 [19] broom_1.0.6                 SparseArray_1.4.8           KernSmooth_2.23-22         
 [22] httr2_1.0.1                 plyr_1.8.9                  cachem_1.1.0               
 [25] GenomicAlignments_1.40.0    lifecycle_1.0.4             iterators_1.0.14           
 [28] pkgconfig_2.0.3             Matrix_1.7-0                R6_2.5.1                   
 [31] fastmap_1.2.0               GenomeInfoDbData_1.2.12     clue_0.3-65                
 [34] digest_0.6.36               colorspace_2.1-0            reshape_0.8.9              
 [37] RSQLite_2.3.7               labeling_0.4.3              filelock_1.0.3             
 [40] timechange_0.3.0            fansi_1.0.6                 mgcv_1.9-1                 
 [43] httr_1.4.7                  abind_1.4-5                 compiler_4.4.0             
 [46] bit64_4.0.5                 withr_3.0.0                 doParallel_1.0.17          
 [49] ConsensusClusterPlus_1.68.0 backports_1.5.0             BiocParallel_1.38.0        
 [52] DBI_1.2.3                   psych_2.4.3                 dendextend_1.17.1          
 [55] biomaRt_2.60.0              MASS_7.3-60.2               rappdirs_0.3.3             
 [58] DelayedArray_0.30.1         rjson_0.2.21                gtools_3.9.5               
 [61] caTools_1.18.2              tools_4.4.0                 zip_2.3.1                  
 [64] glue_1.7.0                  restfulr_0.0.15             nlme_3.1-164               
 [67] cluster_2.1.6               generics_0.1.3              gtable_0.3.5               
 [70] tzdb_0.4.0                  hms_1.1.3                   xml2_1.3.6                 
 [73] utf8_1.2.4                  XVector_0.44.0              foreach_1.5.2              
 [76] pillar_1.9.0                limma_3.60.3                splines_4.4.0              
 [79] logging_0.10-108            circlize_0.4.16             BiocFileCache_2.12.0       
 [82] lattice_0.22-6              renv_0.17.0                 bit_4.0.5                  
 [85] tidyselect_1.2.1            locfit_1.5-9.10             Biostrings_2.72.1          
 [88] knitr_1.47                  gridExtra_2.3               edgeR_4.2.0                
 [91] xfun_0.45                   statmod_1.5.0               stringi_1.8.4              
 [94] UCSC.utils_1.0.0            yaml_2.3.8                  evaluate_0.24.0            
 [97] codetools_0.2-20            BiocManager_1.30.23         cli_3.6.3                  
[100] munsell_0.5.1               Rcpp_1.0.12                 dbplyr_2.5.0               
[103] png_0.1-8                   XML_3.99-0.17               parallel_4.4.0             
[106] blob_1.2.4                  prettyunits_1.2.0           bitops_1.0-7               
[109] txdbmaker_1.0.1             scales_1.3.0                crayon_1.5.3               
[112] GetoptLong_1.0.5            rlang_1.1.4                 cowplot_1.1.3              
[115] KEGGREST_1.44.1             mnormt_2.1.1

skarunan avatar Jun 28 '24 11:06 skarunan

Hi,

The fix didn't go into the 3.19 released. I just push the fix, probably it would take a couple of days to get in, the package version should be 1.40.1. Thanks!

lpantano avatar Jun 28 '24 14:06 lpantano