genomation
genomation copied to clipboard
scoreMatrixBin allocates 415 GB of RAM (1000 bins, 55,000 regions)
I am analysing some low resolution data and need 1000 bins of 1kb each (covering 1MB in total). My estimation for size of such an object would be roughly 1000 * 55000 * sizeof(double)
equals to 419.6 MB
which is the size reported by R below.
However, during scoreMatrixBin call
sm = ScoreMatrixBin(target=track, windows=windows, bin.num=1000, strand.aware=TRUE, weight.col="score")
415 GB of memory is allocated which is not released after gc(). I am not sure if this is genomation or R issue.
Type Size PrettySize Rows Columns
sm ScoreMatrix 441432328 421 Mb 54741 1000
track GRanges 1712096 1.6 Mb 106205 NA
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 8096903 432.5 1.177e+07 628.4 1.177e+07 628.4
Vcells 71511568 545.6 7.377e+10 562797.2 5.571e+10 425044.0
sessionInfo()
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) :
cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'
in shell
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2570 user 20 0 416,3g 415,4g 59208 S 0,0 82,5 15:06.29 rsession
Afer R restart:
sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
locale:
[1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] genomation_1.14.0 BiocParallel_1.16.6
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 lattice_0.20-38 prettyunits_1.0.2 Rsamtools_1.34.1
[5] Biostrings_2.50.2 assertthat_0.2.1 digest_0.6.18 gridBase_0.4-7
[9] R6_2.4.0 GenomeInfoDb_1.18.2 plyr_1.8.4 stats4_3.5.2
[13] RSQLite_2.1.1 httr_1.4.0 ggplot2_3.1.1 pillar_1.4.0
[17] zlibbioc_1.28.0 rlang_0.3.4 GenomicFeatures_1.34.8 progress_1.2.2
[21] lazyeval_0.2.2 rstudioapi_0.10 data.table_1.12.2 blob_1.1.1
[25] S4Vectors_0.20.1 Matrix_1.2-17 readr_1.3.1 stringr_1.4.0
[29] RCurl_1.95-4.12 bit_1.1-14 biomaRt_2.38.0 munsell_0.5.0
[33] DelayedArray_0.8.0 compiler_3.5.2 rtracklayer_1.42.2 pkgconfig_2.0.2
[37] BiocGenerics_0.28.0 tidyselect_0.2.5 SummarizedExperiment_1.12.0 tibble_2.1.1
[41] GenomeInfoDbData_1.2.0 IRanges_2.16.0 matrixStats_0.54.0 XML_3.98-1.19
[45] crayon_1.3.4 dplyr_0.8.1 GenomicAlignments_1.18.1 bitops_1.0-6
[49] gtable_0.3.0 DBI_1.0.0 magrittr_1.5 scales_1.0.0
[53] KernSmooth_2.23-15 stringi_1.4.3 impute_1.56.0 reshape2_1.4.3
[57] XVector_0.22.0 tools_3.5.2 bit64_0.9-7 BSgenome_1.50.0
[61] Biobase_2.42.0 glue_1.3.1 seqPattern_1.7.0 purrr_0.3.2
[65] hms_0.4.2 plotrix_3.7-5 parallel_3.5.2 AnnotationDbi_1.44.0
[69] colorspace_1.4-1 GenomicRanges_1.34.0 memoise_1.1.0
could you send a reproducible example, it doesn't have to be the full example. It just has to exemplify the memory problem. we have to do some sort of memory profiling to see where the problem is.
On Sat, May 18, 2019 at 6:40 PM Piotr Balwierz [email protected] wrote:
I am analysing some low resolution data and need 1000 bins of 1kb each (covering 1MB in total). My estimation for size of such an object would be roughly 1000 * 55000 * sizeof(double) equals to 419.6 MB which is the size reported by R below. However, during scoreMatrixBin call sm = ScoreMatrixBin(target=track, windows=windows, bin.num=1000, strand.aware=TRUE, weight.col="score") 415 GB of memory is allocated which is not released after gc(). I am not sure if this is genomation or R issue.
Type Size PrettySize Rows Columns
sm ScoreMatrix 441432328 421 Mb 54741 1000 track GRanges 1712096 1.6 Mb 106205 NA
gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 8096903 432.5 1.177e+07 628.4 1.177e+07 628.4 Vcells 71511568 545.6 7.377e+10 562797.2 5.571e+10 425044.0
sessionInfo() Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'
in shell
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2570 user 20 0 416,3g 415,4g 59208 S 0,0 82,5 15:06.29 rsession
Afer R restart:
sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux buster/sid
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] grid stats graphics grDevices utils datasets methods base
other attached packages: [1] genomation_1.14.0 BiocParallel_1.16.6
loaded via a namespace (and not attached): [1] Rcpp_1.0.1 lattice_0.20-38 prettyunits_1.0.2 Rsamtools_1.34.1 [5] Biostrings_2.50.2 assertthat_0.2.1 digest_0.6.18 gridBase_0.4-7 [9] R6_2.4.0 GenomeInfoDb_1.18.2 plyr_1.8.4 stats4_3.5.2 [13] RSQLite_2.1.1 httr_1.4.0 ggplot2_3.1.1 pillar_1.4.0 [17] zlibbioc_1.28.0 rlang_0.3.4 GenomicFeatures_1.34.8 progress_1.2.2 [21] lazyeval_0.2.2 rstudioapi_0.10 data.table_1.12.2 blob_1.1.1 [25] S4Vectors_0.20.1 Matrix_1.2-17 readr_1.3.1 stringr_1.4.0 [29] RCurl_1.95-4.12 bit_1.1-14 biomaRt_2.38.0 munsell_0.5.0 [33] DelayedArray_0.8.0 compiler_3.5.2 rtracklayer_1.42.2 pkgconfig_2.0.2 [37] BiocGenerics_0.28.0 tidyselect_0.2.5 SummarizedExperiment_1.12.0 tibble_2.1.1 [41] GenomeInfoDbData_1.2.0 IRanges_2.16.0 matrixStats_0.54.0 XML_3.98-1.19 [45] crayon_1.3.4 dplyr_0.8.1 GenomicAlignments_1.18.1 bitops_1.0-6 [49] gtable_0.3.0 DBI_1.0.0 magrittr_1.5 scales_1.0.0 [53] KernSmooth_2.23-15 stringi_1.4.3 impute_1.56.0 reshape2_1.4.3 [57] XVector_0.22.0 tools_3.5.2 bit64_0.9-7 BSgenome_1.50.0 [61] Biobase_2.42.0 glue_1.3.1 seqPattern_1.7.0 purrr_0.3.2 [65] hms_0.4.2 plotrix_3.7-5 parallel_3.5.2 AnnotationDbi_1.44.0 [69] colorspace_1.4-1 GenomicRanges_1.34.0 memoise_1.1.0
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BIMSBbioinfo/genomation/issues/184?email_source=notifications&email_token=AAE32ENVL4L2SZNFV6Y736TPWAWONA5CNFSM4HN2UQV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GURK5WA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE32ELFMU5COMO6IZZJ2XLPWAWONANCNFSM4HN2UQVQ .
library(genomation)
library(BSgenome.Mmusculus.UCSC.mm9)
library("TxDb.Mmusculus.UCSC.mm9.knownGene")
track = unlist(GenomicRanges::tileGenome(tilewidth=25000, seqlengths=seqlengths(Mmusculus)))
track$score = rnorm(length(track))
sm = ScoreMatrixBin(target=track, windows=promoters(TxDb.Mmusculus.UCSC.mm9.knownGene, upstream=500000, downstream=500000), bin.num=1000, strand.aware=TRUE, weight.col="score")
You might want to scale the problem down if not running on 512GB+ machine.