glmGamPoi icon indicating copy to clipboard operation
glmGamPoi copied to clipboard

test_de failing with on_disk models

Open NathanSiemers opened this issue 2 years ago • 4 comments

Hello Constantin,

Thank you for this code, I've been using and learning more about it for several months. I'm working with some large data sets and using glmGamPoi to perform modeling and hypothesis testing.

I've been able to create full models from HDF5Array input data sets and on_disk = TRUE. I did need to adjust the chunk size params to get this to occur in finite time. However, the on-disk version of the model output cannot be queried with test_de. Performing such test gives the error:

Error in combine_size_factors_and_offset(offset, size_factors, Y, verbose = verbose) : length(offset) == 1 || length(offset) == n_samples is not TRUE

(seems to be coming from estimate_size_factors.R)

If I run the same model with a realized input matrix and on_disk = FALSE, everything works as expected. I wonder if test_de can't determine the right dimensions or related parameters for on_disk models and is instead throwing this error?

It could be a general bug, it could be a problem that only shows up when one alters default H5 temp file paths, etc? But I've looked, and I think the on-disk model objects have hard-coded locations for data contained within them.... Hmm.

Attached is the log file (docx). You can get a full tgz of the code and input data at the link below. I've tried to make this an MRE, but even very truncated the singlecellexperiment file is a little big.

Thanks, Nathan Siemers mre.glmgampoi.bug.docx

tgz file: https://drive.google.com/file/d/1fYN4F7TtFpwgC6yqYZDP_rDHBiV4XU5H/view?usp=sharing

NathanSiemers avatar Mar 30 '22 20:03 NathanSiemers

btw, removing the offset parameter in the glm_gp call doesn't change anything.

NathanSiemers avatar Apr 01 '22 17:04 NathanSiemers

Hey, sorry for the delay. I was at a workshop this week. I did take a briefly this afternoon while waiting for my flight and tried the code below to reproduce the issue, but my version finished successfully.

library(glmGamPoi)
model_matrix <- cbind(1, rnorm(5))
true_Beta <- cbind(rnorm(n = 30), rnorm(n = 30, mean = 3))
sf <- exp(rnorm(n = 5, mean = 0.7))
model_matrix
#>      [,1]       [,2]
#> [1,]    1  1.1480341
#> [2,]    1 -0.6869605
#> [3,]    1 -2.1089572
#> [4,]    1  0.1170987
#> [5,]    1  1.0913436
Y <- matrix(rnbinom(n = 30 * 5, mu = sf * exp(true_Beta %*% t(model_matrix)), size = 1/2.4),
            nrow = 30, ncol = 5)
options(DelayedArray.block.size = 1e9)
DelayedArray::setAutoBlockSize(1e9)
#> automatic block size set to 1e+09 bytes (was 1e+08)
DelayedArray::setAutoBlockShape('last-dim-grows-first')
#> automatic block shape set to "last-dim-grows-first" (was "hypercube")
Y_h5 <- HDF5Array::writeHDF5Array(Y)

lmf <- glm_gp(
  Y_h5,
  model_matrix,
  on_disk = TRUE,
  verbose = TRUE,
  offset = 0
)
#> Calculate Size Factors (normed_sum)
#> Make initial dispersion estimate
#> Make initial beta estimate
#> Estimate beta
#> Estimate dispersion
#> Fit dispersion trend
#> Shrink dispersion estimates
#> Estimate beta again
compare.treat = test_de( lmf, c(0,1), verbose = TRUE )
#> Fit reduced model
#> Calculate quasi likelihood ratio
#> Prepare results

Created on 2022-04-01 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2022-04-01                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date       lib source        
#>  backports              1.2.1    2020-12-09 [1] CRAN (R 4.1.0)
#>  beachmat               2.8.1    2021-08-12 [1] Bioconductor  
#>  Biobase                2.52.0   2021-05-19 [1] Bioconductor  
#>  BiocGenerics           0.38.0   2021-05-19 [1] Bioconductor  
#>  bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.0)
#>  cli                    3.0.1    2021-07-17 [1] CRAN (R 4.1.0)
#>  crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.0)
#>  DelayedArray           0.18.0   2021-05-19 [1] Bioconductor  
#>  DelayedMatrixStats     1.14.3   2021-08-26 [1] Bioconductor  
#>  digest                 0.6.27   2020-10-24 [1] CRAN (R 4.1.0)
#>  ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate               0.14     2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi                  0.5.0    2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.0)
#>  fs                     1.5.0    2020-07-31 [1] CRAN (R 4.1.0)
#>  GenomeInfoDb           1.28.4   2021-09-05 [1] Bioconductor  
#>  GenomeInfoDbData       1.2.6    2021-09-09 [1] Bioconductor  
#>  GenomicRanges          1.44.0   2021-05-19 [1] Bioconductor  
#>  glmGamPoi            * 1.4.0    2021-05-19 [1] Bioconductor  
#>  glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.0)
#>  HDF5Array              1.20.0   2021-05-19 [1] Bioconductor  
#>  highr                  0.9      2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.1.0)
#>  IRanges                2.26.0   2021-05-19 [1] Bioconductor  
#>  knitr                  1.34     2021-09-09 [1] CRAN (R 4.1.1)
#>  lattice                0.20-44  2021-05-02 [1] CRAN (R 4.1.1)
#>  lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.1.0)
#>  magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.1.0)
#>  Matrix                 1.3-4    2021-06-01 [1] CRAN (R 4.1.1)
#>  MatrixGenerics         1.5.3    2021-09-09 [1] Bioconductor  
#>  matrixStats            0.60.1   2021-08-23 [1] CRAN (R 4.1.0)
#>  pillar                 1.6.2    2021-07-29 [1] CRAN (R 4.1.0)
#>  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
#>  Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.1.0)
#>  RCurl                  1.98-1.4 2021-08-17 [1] CRAN (R 4.1.0)
#>  reprex                 2.0.1    2021-08-05 [1] CRAN (R 4.1.0)
#>  rhdf5                  2.36.0   2021-05-19 [1] Bioconductor  
#>  rhdf5filters           1.4.0    2021-05-19 [1] Bioconductor  
#>  Rhdf5lib               1.14.2   2021-07-06 [1] Bioconductor  
#>  rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown              2.11     2021-09-14 [1] CRAN (R 4.1.1)
#>  rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.0)
#>  S4Vectors              0.30.0   2021-05-19 [1] Bioconductor  
#>  sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.0)
#>  sparseMatrixStats      1.4.2    2021-08-08 [1] Bioconductor  
#>  stringi                1.7.4    2021-08-25 [1] CRAN (R 4.1.0)
#>  stringr                1.4.0    2019-02-10 [1] CRAN (R 4.1.0)
#>  styler                 1.5.1    2021-07-13 [1] CRAN (R 4.1.0)
#>  SummarizedExperiment   1.22.0   2021-05-19 [1] Bioconductor  
#>  tibble                 3.1.4    2021-08-25 [1] CRAN (R 4.1.0)
#>  utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
#>  withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun                   0.26     2021-09-14 [1] CRAN (R 4.1.1)
#>  XVector                0.32.0   2021-05-19 [1] Bioconductor  
#>  yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.0)
#>  zlibbioc               1.38.0   2021-05-19 [1] Bioconductor  
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

I will have to take a more detailed look what's the difference between your code/data and my version next week. But thanks for making such a detailed bug report, I am sure we will able to figure out what is going wrong here :)

Best, Constantin

const-ae avatar Apr 01 '22 21:04 const-ae