lapack icon indicating copy to clipboard operation
lapack copied to clipboard

Possible bug in `dgesdd` routine

Open SzymonNowakowski opened this issue 2 years ago • 34 comments

I believe I found a bug in LAPACK in svd(.) / dgesdd function:

Problem description

R routine svd(.) from LAPACK crashes in dgesdd on a full rank matrix with dimensions 5344x263.

Problem replication

library(Rfssa)
load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData")
svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

But crashes_svd is a full rank matrix, it should decompose in svd(.):

> qr(crashes_svd)$rank [1] 263 > dim(crashes_svd) [1] 5344 263 EDIT 07/04/2022: ~The above example (a matrix 5344x263) is minimal to reproduce the bug - If you remove a row or a column from that matrix, svd(.) successfully decomposes a resulting matrix into SVD components.~ I have verified, that (in general) removing random rows or columns of that special matrix makes it pass the SVD, but one can also remove them in a certain order, obtaining a sequence of matrices failing SVD computation:

crashes_svd_col <- crashes_svd[, -51]         #removing the 51st column
svd(crashes_svd_col)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
crashes_svd <- crashes_svd[-393,]             #removing a sequence of rows: 393, 962, 642
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
crashes_svd <- crashes_svd[-962,]
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
crashes_svd <- crashes_svd[-642,]
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

END OF EDIT 07/04/2022

Software versions

Windows 7, R 4.2.0, LAPACK 3.10.0:

> La_version() [1] "3.10.0" > print(sessionInfo())

R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Windows 7, R 4.1.0, LAPACK 3.9.0:

> La_version() [1] "3.9.0" > print(sessionInfo())

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Ubuntu 20.04, R 4.1.2, LAPACK 3.9.0:

> La_version() [1] "3.9.0" > print(sessionInfo())

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Checklist

  • [x] I've included a minimal example to reproduce the issue
  • [ ] I'd be willing to make a PR to solve this issue

SzymonNowakowski avatar May 19 '22 09:05 SzymonNowakowski

It is worth mentioning that the data that crashes svd LAPACK routine is a part of Insurance data set, isolated for the purpose of reproducing the bug.

SzymonNowakowski avatar May 19 '22 12:05 SzymonNowakowski

I found and extracted another case for this possible bug while working with Insurance data set. It is a full rank 5344x255 matrix. To reproduce it, please run:

library(Rfssa)
load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_1.RData")
svd(crashes_svd)
#Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
dim(crashes_svd)
# [1] 5344  255
qr(crashes_svd)$rank
# [1] 255

SzymonNowakowski avatar Jul 03 '22 08:07 SzymonNowakowski

I found and extracted another case for this possible bug while working with Insurance data set some more. It is a full rank 5344x258 matrix this time. This is the last one I report here: I don't want to create a mess out of this thread, but I hope that 3 different (but no doubt: similar) cases may add something meaningful to the analysis of the underlying problem. The next ones would undoubtedly add very little value, if any.

The way they were all stumbled upon concerns a sample of a random 10% portion of the Insurance data set (hence equal number of rows in all 3 examples) and further analysing it (including attempting the SVD). So it seems that quite often the SVD would fail (out of 200 attempted random runs, I usually get this error in the first 20 runs, no matter the seed value at the beginning).

You can reproduce this final example with

library(Rfssa)
load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_2.RData")
svd(crashes_svd)
#Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
dim(crashes_svd)
# [1] 5344  258
qr(crashes_svd)$rank
# [1] 258

I have verified, that (in general) removing random rows or removing columns of that special matrix makes it pass the SVD, but one can also remove rows of that matrix in a certain order, obtaining a sequence of matrices failing SVD computation:

crashes_svd <- crashes_svd[-1874,]
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
crashes_svd <- crashes_svd[-129,]
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
crashes_svd <- crashes_svd[-194,]
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
crashes_svd <- crashes_svd[-373,]
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

SzymonNowakowski avatar Jul 04 '22 08:07 SzymonNowakowski

I couldn't reproduce the error on my Linux machine with LAPACK 3.9.0 through OpenBLAS. Does your R also use OpenBLAS?

My system:

> La_version()
[1] "3.9.0"
> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

I tried:

https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData
https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_1.RData
https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_2.RData

The singular values:

  • First case:
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData")
> dim(crashes_svd)
[1] 5344  263
> s <- svd(crashes_svd)
> s$d
  [1] 80.237581 78.035925 76.401927 75.156859 74.909977 74.682839 74.405535
  [8] 74.182772 73.978817 73.748076 73.715598 73.658436 73.596700 73.547603
 [15] 73.484632 73.414763 73.399336 73.384719 73.375879 73.354423 73.330749
 [22] 73.314610 73.292720 73.285536 73.278840 73.274262 73.266730 73.257655
 [29] 73.250795 73.244416 73.239847 73.235621 73.229620 73.223629 73.219221
 [36] 73.215167 73.209824 73.205479 73.201319 73.191746 73.191746 73.191746
 [43] 73.191746 73.191746 73.187600 73.184882 73.184882 73.184882 73.179223
 [50] 73.174548 73.171161 73.168689 73.164303 73.164303 73.164303 73.164303
 [57] 73.160428 73.157447 73.157447 73.157447 73.154329 73.150592 73.150592
 [64] 73.150592 73.150592 73.150592 73.150592 73.150592 73.145491 73.143740
 [71] 73.143740 73.140924 73.136890 73.136890 73.136890 73.136890 73.136890
 [78] 73.136890 73.136890 73.136890 73.136890 73.136890 73.132298 73.130042
 [85] 73.130042 73.130042 73.130042 73.130042 73.130042 73.126859 73.123195
 [92] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
 [99] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[106] 73.120294 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[113] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[120] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[127] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[134] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[141] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[148] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[155] 73.116351 73.116351 73.111979 73.109508 73.109508 73.109508 73.109508
[162] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[169] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[176] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[183] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[190] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[197] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[204] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[211] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[218] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[225] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[232] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[239] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[246] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[253] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[260] 73.109508 73.109508 73.109508  2.081634
  • Second case:
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_1.RData")
> dim(crashes_svd)
[1] 5344  255
> s <- svd(crashes_svd)
> s$d
  [1] 80.338635 78.125716 76.424973 75.044972 74.936751 74.684774 74.394142
  [8] 74.146391 73.951330 73.844023 73.755771 73.678289 73.598102 73.571031
 [15] 73.532607 73.463722 73.374985 73.361678 73.338794 73.326327 73.313453
 [22] 73.288042 73.285106 73.281151 73.273964 73.265097 73.260490 73.253863
 [29] 73.246726 73.242377 73.234456 73.224807 73.219221 73.219221 73.219221
 [36] 73.219221 73.208610 73.200970 73.191746 73.189781 73.184882 73.184882
 [43] 73.184882 73.181404 73.178021 73.178021 73.175230 73.171161 73.171161
 [50] 73.171161 73.171161 73.171161 73.165983 73.164303 73.160617 73.157447
 [57] 73.157447 73.154335 73.150592 73.150592 73.150592 73.148074 73.143740
 [64] 73.143740 73.143740 73.143740 73.143740 73.143740 73.143740 73.143740
 [71] 73.139859 73.136890 73.136890 73.136890 73.136890 73.136890 73.136890
 [78] 73.134036 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042
 [85] 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042
 [92] 73.130042 73.126754 73.123195 73.123195 73.123195 73.123195 73.123195
 [99] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[106] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[113] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[120] 73.119217 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[127] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[134] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[141] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[148] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[155] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[162] 73.116351 73.116351 73.116351 73.111642 73.109508 73.109508 73.109508
[169] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[176] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[183] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[190] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[197] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[204] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[211] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[218] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[225] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[232] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[239] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[246] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[253] 73.109508 73.109508  1.041707
  • Third case:
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_2.RData")
> dim(crashes_svd)
[1] 5344  258
> s <- svd(crashes_svd)
> s$d
  [1] 80.027683 78.131810 76.515927 75.051548 75.004976 74.826565 74.446285
  [8] 74.117935 73.882036 73.822317 73.771688 73.720786 73.636142 73.559170
 [15] 73.533816 73.479562 73.420415 73.345030 73.314114 73.308726 73.305097
 [22] 73.298845 73.292099 73.285671 73.281151 73.276671 73.270169 73.261310
 [29] 73.253607 73.249163 73.240953 73.232969 73.229093 73.223585 73.219221
 [36] 73.215430 73.210622 73.205479 73.205479 73.205479 73.205479 73.195552
 [43] 73.191746 73.187204 73.182422 73.178021 73.178021 73.174248 73.171161
 [50] 73.171161 73.166166 73.161889 73.157447 73.155778 73.150592 73.150592
 [57] 73.150592 73.150592 73.150592 73.147917 73.143740 73.143740 73.143740
 [64] 73.143740 73.143740 73.143740 73.143740 73.143740 73.143740 73.139893
 [71] 73.136890 73.136890 73.136890 73.136890 73.136890 73.136890 73.136890
 [78] 73.136890 73.133319 73.130042 73.130042 73.130042 73.130042 73.130042
 [85] 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042
 [92] 73.126359 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
 [99] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[106] 73.123195 73.123195 73.123195 73.123195 73.123195 73.119713 73.116351
[113] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[120] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[127] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[134] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[141] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[148] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[155] 73.116351 73.111963 73.109508 73.109508 73.109508 73.109508 73.109508
[162] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[169] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[176] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[183] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[190] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[197] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[204] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[211] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[218] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[225] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[232] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[239] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[246] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[253] 73.109508 73.109508 73.109508 73.109508 73.109508  1.040376

weslleyspereira avatar Jul 05 '22 20:07 weslleyspereira

No, on the Windows machine I don't have OpenBLAS. The Ubuntu machine is currently down, I will check & let you know when it gets up.

> La_version()
[1] "3.10.0"
> print(sessionInfo())
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DMRnet_0.3.0.9000 DAAG_1.25.4       digest_0.6.29     vioplot_0.3.7     zoo_1.8-10        sm_2.2-5.7       
 [7] Rfssa_2.0.1       dplyr_1.0.9       devtools_2.4.3    usethis_2.1.6    

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3    ellipsis_0.3.2      mclust_5.4.9        rprojroot_2.0.3     fs_1.5.2           
  [6] rstudioapi_0.13     remotes_2.4.2       RSpectra_0.16-1     fansi_1.0.3         mvtnorm_1.1-3      
 [11] codetools_0.2-18    splines_4.2.0       extrafont_0.18      cachem_1.0.6        knitr_1.39         
 [16] pkgload_1.2.4       jsonlite_1.8.0      Rttf2pt1_1.3.10     cluster_2.1.3       png_0.1-7          
 [21] shiny_1.7.1         compiler_4.2.0      httr_1.4.3          rainbow_3.6         lazyeval_0.2.2     
 [26] Matrix_1.4-1        fastmap_1.1.0       cli_3.3.0           later_1.3.0         hrbrthemes_0.8.0   
 [31] htmltools_0.5.2     prettyunits_1.1.1   tools_4.2.0         gtable_0.3.0        glue_1.6.2         
 [36] Rcpp_1.0.8.3        fracdiff_1.5-1      vctrs_0.4.1         urca_1.3-0          nlme_3.1-157       
 [41] extrafontdb_1.0     iterators_1.0.14    grpreg_3.4.0        lmtest_0.9-40       timeDate_3043.102  
 [46] xfun_0.31           rbibutils_2.2.8     ps_1.7.0            brio_1.1.3          testthat_3.1.4     
 [51] mime_0.12           lifecycle_1.0.1     MASS_7.3-56         scales_1.2.0        promises_1.2.0.1   
 [56] parallel_4.2.0      RColorBrewer_1.1-3  quantmod_0.4.20     curl_4.3.2          memoise_2.0.1      
 [61] ggplot2_3.3.6       gdtools_0.2.4       latticeExtra_0.6-29 tseries_0.10-51     desc_1.4.1         
 [66] pcaPP_2.0-1         foreach_1.5.2       TTR_0.24.3          pkgbuild_1.3.1      hdrcde_3.4         
 [71] shape_1.4.6         Rdpack_2.3.1        rlang_1.0.2         pkgconfig_2.0.3     systemfonts_1.0.4  
 [76] bitops_1.0-7        pracma_2.3.8        evaluate_0.15       fda_6.0.3           lattice_0.20-45    
 [81] purrr_0.3.4         htmlwidgets_1.5.4   ks_1.13.5           processx_3.5.3      tidyselect_1.1.2   
 [86] deSolve_1.32        magrittr_2.0.3      R6_2.5.1            generics_0.1.2      pillar_1.7.0       
 [91] withr_2.5.0         xts_0.12.1          survival_3.3-1      RCurl_1.98-1.6      nnet_7.3-17        
 [96] tibble_3.1.7        crayon_1.5.1        KernSmooth_2.23-20  utf8_1.2.2          plotly_4.10.0      
[101] rmarkdown_2.14      jpeg_0.1-9          fds_1.8             grid_4.2.0          data.table_1.14.2  
[106] callr_3.7.0         forecast_8.16       svd_0.5.1           xtable_1.8-4        Rssa_1.0.4         
[111] tidyr_1.2.0         httpuv_1.6.5        munsell_0.5.0       glmnet_4.1-4        viridisLite_0.4.0  
[116] sessioninfo_1.2.2   quadprog_1.5-8     

SzymonNowakowski avatar Jul 05 '22 21:07 SzymonNowakowski

But I had the same error on another machine with an older R version (3.6.3) in Ubuntu with OpenBLAS:

> svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
> La_version()
[1] "3.9.0"
> print(session
session_info   sessionInfo    sessioninfo::
> svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
> La_version()
[1] "3.9.0"
> print(sessionInfo())
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DMRnet_0.3.0.9000 devtools_2.2.2    usethis_2.1.6

loaded via a namespace (and not attached):
 [1] compiler_3.6.3    prettyunits_1.1.1 remotes_2.4.2     iterators_1.0.13
 [5] tools_3.6.3       testthat_3.1.4    pkgbuild_1.3.1    pkgload_1.2.4
 [9] memoise_2.0.1     lifecycle_1.0.1   lattice_0.20-40   rlang_1.0.2
[13] Matrix_1.2-18     foreach_1.5.1     cli_3.3.0         rstudioapi_0.13
[17] fastmap_1.1.0     withr_2.5.0       desc_1.4.1        fs_1.5.2
[21] rprojroot_2.0.3   glmnet_4.1-2      grid_3.6.3        glue_1.6.2
[25] R6_2.5.1          grpreg_3.4.0      processx_3.6.1    survival_3.1-8
[29] sessioninfo_1.2.2 callr_3.7.0       purrr_0.3.4       magrittr_2.0.1
[33] ps_1.7.1          codetools_0.2-16  ellipsis_0.3.2    splines_3.6.3
[37] shape_1.4.6       cachem_1.0.6      crayon_1.5.1      brio_1.1.3

SzymonNowakowski avatar Jul 05 '22 21:07 SzymonNowakowski

Ok. I could reproduce the issue using https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData in my machine:

> La_version()
[1] "3.9.0"
> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

So, no issue using OpenBLAS; and Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd' when using LAPACK 3.9.0.

weslleyspereira avatar Jul 05 '22 22:07 weslleyspereira

Thank you. I installed OpenBlas on the Ubuntu machine and I confirm that this workaround works for all 3 isolated cases. For me it is sufficient as I can push my research ahead. Great thanks!

SzymonNowakowski avatar Jul 06 '22 06:07 SzymonNowakowski

Unfortunately, OpenBLAS creates problems of its own it seems. I have isolated another matrix which passes SVD on a machine without OpenBLAS, but fails SVD with OpenBLAS. Please check this code:

library(Rfssa)
load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_with_Open_BLAS.RData")
svd(crashes_svd)
# Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

My system

> La_version()
[1] "3.9.0"
> print(sessionInfo())
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DMRnet_0.3.0.9000 devtools_2.4.3    usethis_2.1.5

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8        compiler_4.1.2    prettyunits_1.1.1 remotes_2.4.2
 [5] iterators_1.0.14  tools_4.1.2       testthat_3.1.2    pkgbuild_1.3.1
 [9] pkgload_1.2.4     memoise_2.0.1     lifecycle_1.0.1   lattice_0.20-45
[13] rlang_1.0.1       Matrix_1.4-0      foreach_1.5.2     cli_3.1.1
[17] rstudioapi_0.13   fastmap_1.1.0     withr_2.4.3       desc_1.4.0
[21] fs_1.5.2          rprojroot_2.0.2   glmnet_4.1-3      grid_4.1.2
[25] glue_1.6.1        R6_2.5.1          grpreg_3.4.0      processx_3.5.2
[29] survival_3.2-13   sessioninfo_1.2.2 callr_3.7.0       purrr_0.3.4
[33] magrittr_2.0.2    splines_4.1.2     ps_1.6.0          codetools_0.2-18
[37] ellipsis_0.3.2    shape_1.4.6       cachem_1.0.6      crayon_1.4.2
[41] brio_1.1.3

SzymonNowakowski avatar Jul 06 '22 10:07 SzymonNowakowski

Unfortunately, OpenBLAS creates problems of its own it seems. I have isolated another matrix which passes SVD on a machine without OpenBLAS, but fails SVD with OpenBLAS.

Thanks for collecting this additional information! We are working to try to find the problem.

weslleyspereira avatar Jul 06 '22 13:07 weslleyspereira

No crash with current OpenBLAS (0.3.20) - Ubuntu 20.04 LTS apparently ships with outdated 0.3.8 (that probably hits some unrelated "historic" bug). DNRM2 may be a good candidate for different corner case behaviour in OpenBLAS vs netlib.

martin-frbg avatar Jul 06 '22 16:07 martin-frbg

All this is very interesting. Some thoughts: (1) we should keep these matrices in our torture test suite, it seems like they trigger some difficulties; (2) maybe getting the bidiagonal matrices is better than the full matrices; (3) one day, some days, in addition to always testing the current version of LAPACK, we should test older version, count the number of bugs we fixed, and the speedup we got, I think, we'll feel good about our work for the last x years; (4) let us keep on digging to try to identify the issue

langou avatar Jul 06 '22 16:07 langou

I managed to isolate the matrix from https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData into a text file, and then run an example in using C.

Let me go step-by-step so we can use it to gather more information about this issue.

  1. The following code writes the transpose matrix which is more adequate for reading in column major formats:
> library(Rfssa)
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData")
> write.table(format(t(crashes_svd), scientific=TRUE, digits=22), file="crashes_svd.txt")

The following command in R shows the undesirable output ``:

> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
(...)
> crashes_svd <- read.table(file="crashes_svd.txt")
> svd(t(crashes_svd))
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
  1. I worked on the file crashes_svd.txt to obtain: crashes_svd2.zip. This file will be used as input in my C code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#ifndef lapack_int
    #define lapack_int int
#endif

lapack_int min(lapack_int a, lapack_int b)
{
    return (a > b) ? b : a;
}

void dgesdd_(
    char const* jobz,
    lapack_int const* m, lapack_int const* n,
    double* A, lapack_int const* lda,
    double* S,
    double* U, lapack_int const* ldu,
    double* VT, lapack_int const* ldvt,
    double* work, lapack_int const* lwork,
    lapack_int* iwork,
    lapack_int* info );

int main(int argc, char *argv[])
{
    // sizes
    lapack_int n, m;
    scanf( "%d", &m );
    scanf( "%d", &n );

    // allocate
    double *A = (double*) malloc( (m*n) * sizeof(double) );
    double *S = (double*) malloc( min(m,n) * sizeof(double) );
    double *U = (double*) malloc( (m*m) * sizeof(double) );
    double *V = (double*) malloc( (n*n) * sizeof(double) );

    // read col-major data
    for (int i = 0; i < m*n; i++)
    {
        long double aux;
        scanf( "%Lf", &aux );
        A[i] = aux;
    }

    // workspaces
    lapack_int *iwork = (lapack_int*) malloc( (8*min(m,n)) * sizeof(lapack_int) );
    lapack_int lwork = -1;

    // work query
    lapack_int info = 0;
    double work_size = 0;
    dgesdd_(
        "A",
        &m,&n,A,&m,
        S,
        U,&m,
        V,&n,
        &work_size,&lwork,iwork,
        &info
    );

    // allocate workspace
    double *work = (double*) malloc( work_size * sizeof(double) );
    lwork = work_size;

    // run
    dgesdd_(
        "A",
        &m,&n,A,&m,
        S,
        U,&m,
        V,&n,
        work,&lwork,iwork,
        &info
    );

    // print info
    printf( "%d\n", info );
    
    // print singular values
    printf( "\nSingular values:\n" );
    for (int i = 0; i < min(m,n); i++)
    {
        printf( "%lf ", S[i] );
    }

    return 0;
}
  1. However, I couldn't reproduce the issue using this code:
$ gcc test_dgesdd.c -o test_dgesdd /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
$ ./test_dgesdd < crashes_svd2.txt 
0

Singular values:
138.921459 135.819739 135.520150 134.976011 127.236185 126.719818 124.754208 123.587351 120.010983 116.994755 116.684704 114.853547 113.646145 112.657777 111.713019 111.507807 111.179755 111.150557 110.816460 110.293399 109.425710 109.330495 109.061578 108.966178 108.651045 107.629418 106.823530 106.648234 106.364113 106.315408 106.131774 105.611349 105.252831 105.161095 105.128230 104.798008 104.504731 104.276730 101.742746 101.102745 98.589703 97.881927 97.539067 97.362174 97.326156 97.171976 96.958740 96.323631 95.481025 95.164000 95.062074 94.834365 94.535047 94.206671 93.801462 93.242641 92.775750 92.380965 91.346124 90.589142 88.894442 88.663396 87.867799 87.500949 87.314072 87.046753 86.995633 86.766978 86.407967 86.138031 86.041301 85.889684 85.769190 85.707008 85.574748 85.186597 84.857056 84.773774 84.644520 84.027815 83.792789 83.721602 83.210783 83.134147 82.830668 82.645403 82.531031 82.074604 81.602327 80.878397 80.327302 79.982995 79.624131 79.472658 79.282894 78.884349 78.735949 78.506006 78.253994 78.044852 77.624866 77.304068 77.067689 77.004966 76.340971 76.181077 75.343331 74.474760 73.278096 72.919688 72.328948 72.000425 71.693862 71.459548 70.696953 70.590415 69.861243 68.660174 68.221902 68.131634 67.432133 67.371519 66.960725 66.421092 66.079473 65.643671 65.518510 65.461814 65.096926 64.643686 64.416828 64.087087 63.977570 63.404865 62.540327 62.146893 61.926394 61.695142 61.137044 60.897390 60.621789 60.281927 60.144524 59.992570 59.833020 59.278489 59.071738 59.015457 58.538994 58.401949 58.217545 58.081582 57.900982 57.808147 57.756721 56.721582 56.373733 56.089498 55.675927 55.536675 55.171722 55.131030 54.755947 54.663055 54.480378 54.158751 54.029876 53.859765 53.723131 53.544550 53.217714 52.860974 52.193011 51.649154 51.272390 51.188479 50.745404 50.594944 50.268060 50.214799 50.122710 49.857880 49.547364 49.275911 48.780430 48.446466 47.564610 47.367113 46.953331 46.691890 46.293246 46.279272 45.995560 45.647042 45.460969 45.201356 45.102905 45.047106 44.929637 44.629221 44.298290 44.044366 44.020477 43.704117 43.401700 42.621339 42.579407 42.416868 42.018328 41.691898 41.425460 40.498291 40.395045 40.088446 39.280906 38.805429 38.572036 37.916318 37.633313 37.285791 36.482753 36.067144 35.765946 35.458111 35.046091 34.473421 34.380670 34.248538 34.057086 33.664821 33.199721 32.948524 32.346531 32.150851 31.206320 30.873987 30.561778 30.127724 29.786624 29.356329 28.791116 28.582062 28.182265 28.084910 27.701852 27.397549 27.187937 26.449320 26.041613 25.685849 25.395753 25.214940 24.436499 23.799655 22.067128 20.960366 19.984624 19.936679 19.558175 18.505146 17.745319 16.961943 11.628948
  1. Notice that the singular values obtained by LAPACK 3.9.0 are different from the ones obtained by OpenBLAS 0.3.8 (this is the version installed in my machine). I shared those values in a previous comment: https://github.com/Reference-LAPACK/lapack/issues/672#issuecomment-1175475996.

weslleyspereira avatar Jul 06 '22 19:07 weslleyspereira

Hi @SzymonNowakowski. Could you try to use my input (https://github.com/Reference-LAPACK/lapack/files/9057573/crashes_svd2.zip) in your machines? Thanks

weslleyspereira avatar Jul 06 '22 21:07 weslleyspereira

Hi Weslley, on the two Ubuntu machines I got the exact same output as you did (0 followed by the line Singular values: and then singular values followed starting with 138.921459) after executing your two-liner

$ gcc test_dgesdd.c -o test_dgesdd /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
$ ./test_dgesdd < crashes_svd2.txt 

The two machines are the ones with very different R versions:

> print(sessionInfo())
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

and

> print(sessionInfo())
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

But obviously the LAPACK library was the same (3.9.0) and R was not used in this test.

If you want me to check something more, I gladly will, but in the morning (it is midnight in Poland).

Thank you for having a look into this issue.

SzymonNowakowski avatar Jul 06 '22 22:07 SzymonNowakowski

The matrix I provided before, crashes_svd2.zip, had an error. I switched m with n. Here follows the output using the correct pencil crashes_svd3.zip :

$ gcc test_dgesdd.c -o test_dgesdd /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
weslleyp@weslleyp-XPS-15-9510:~/storage/ballistic/lapack-issue-672$ ./test_dgesdd < crashes_svd3.txt 
1

Singular values:
0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956845 0.956845 0.956845 0.956845 0.956845 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.957025 0.957025 0.957114 0.957114 0.957114 0.957114 0.957114 0.957114 0.957204 0.957204 0.957204 0.957294 0.957294 0.957294 0.957294 0.957351 0.957383 0.957563 0.957563 0.957563 0.957653 0.957653 0.957653 0.957653 0.957833 0.958012 0.958282 0.958732 1.049842 1.021035 0.999655 0.983365 0.980134 0.977163 0.973534 0.970620 0.967951 0.964932 0.964507 0.963759 0.962951 0.962309 0.961485 0.960571 0.960369 0.960178 0.960062 0.959781 0.959472 0.959260 0.958974 0.958880 0.958792 0.958634 0.958515 0.958425 0.958342 0.958227 0.958148 0.958070 0.957959 0.957889 0.957778 0.957653 0.957599 0.957489 0.957428 0.957243 0.957163 0.957114 0.957048 0.956988 0.956935 0.956875 0.956845 0.956804 0.956756 0.956756 0.956718 0.956609 0.956577 0.027236 0.956666 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 1.000000 1.000000 1.000000 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.956577

INFO=1 means that DBDSDC did not converge.

weslleyspereira avatar Jul 06 '22 23:07 weslleyspereira

More tests:

  1. Using my C code compiled with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1):

    • I couldn't reproduce the issue with LAPACK 3.10.0.
    • I could reproduce the issue in LAPACK 3.9.0 and LAPACK 3.9.1.
  2. I took the time to compare the reductions to bidiagonal using LAPACK 3.9.1 and LAPACK 3.10.0.

    • For that, I used the diff patch on LAPACK
    diff --git a/SRC/dgesdd.f b/SRC/dgesdd.f
    index a6770999c..3f39940b3 100644
    --- a/SRC/dgesdd.f
    +++ b/SRC/dgesdd.f
    @@ -864,6 +864,7 @@
                    CALL DGEBRD( N, N, A, LDA, S, WORK( IE ), WORK( ITAUQ ),
          $                      WORK( ITAUP ), WORK( NWORK ), LWORK-NWORK+1,
          $                      IERR )
    +               RETURN
     *
     *              Perform bidiagonal SVD, computing left singular vectors
     *              of bidiagonal matrix in WORK(IU) and computing right
    
    • The reductions were collected in my C code using:
     printf( "\n%.22e ", 0.0 ); 
     for (int i = 1; i < min(m,n); i++)
         printf( "%.22e ", A[(i-1)+i*m] );
     printf( "\n" );
     for (int i = 0; i < min(m,n); i++)
         printf( "%.22e ", A[i+i*m] );
    

    right after the call to dgesdd_.

    • Bidiagonal matrices: B_LAPACK391.txt: LAPACK 3.9.1 B_LAPACK391_scinotation.txt: LAPACK 3.9.1 using scientific notation B_LAPACK310.txt: LAPACK 3.10.0 B_LAPACK310_scinotation.txt: LAPACK 3.10.0 using scientific notation

    • The matrices are significantly different. The superdiagonal has values of different magnitude. For instance, there is a zero in the superdiagonal using LAPACK 3.9.1 which is actually -6.2172489379008766263723e-13 when using LAPACK 3.10.0.

weslleyspereira avatar Jul 07 '22 17:07 weslleyspereira

Thanks for those comparisons. You are working at the levels too deep for me to follow. Thus I just want to add one more piece of information from my side, since I am not sure if you have it already covered in those 4 txt files: I had the same error in LAPACK 3.10.0 but in Windows:

> print(sessionInfo())
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DMRnet_0.3.0.9000 DAAG_1.25.4       digest_0.6.29     vioplot_0.3.7     zoo_1.8-10        sm_2.2-5.7       
 [7] Rfssa_2.0.1       dplyr_1.0.9       devtools_2.4.3    usethis_2.1.6    

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3    ellipsis_0.3.2      mclust_5.4.9        rprojroot_2.0.3     fs_1.5.2           
  [6] rstudioapi_0.13     remotes_2.4.2       RSpectra_0.16-1     fansi_1.0.3         mvtnorm_1.1-3      
 [11] codetools_0.2-18    splines_4.2.0       extrafont_0.18      cachem_1.0.6        knitr_1.39         
 [16] pkgload_1.2.4       jsonlite_1.8.0      Rttf2pt1_1.3.10     cluster_2.1.3       png_0.1-7          
 [21] shiny_1.7.1         compiler_4.2.0      httr_1.4.3          rainbow_3.6         lazyeval_0.2.2     
 [26] Matrix_1.4-1        fastmap_1.1.0       cli_3.3.0           later_1.3.0         hrbrthemes_0.8.0   
 [31] htmltools_0.5.2     prettyunits_1.1.1   tools_4.2.0         gtable_0.3.0        glue_1.6.2         
 [36] Rcpp_1.0.8.3        fracdiff_1.5-1      vctrs_0.4.1         urca_1.3-0          nlme_3.1-157       
 [41] extrafontdb_1.0     iterators_1.0.14    grpreg_3.4.0        lmtest_0.9-40       timeDate_3043.102  
 [46] xfun_0.31           rbibutils_2.2.8     ps_1.7.0            brio_1.1.3          testthat_3.1.4     
 [51] mime_0.12           lifecycle_1.0.1     MASS_7.3-56         scales_1.2.0        promises_1.2.0.1   
 [56] parallel_4.2.0      RColorBrewer_1.1-3  quantmod_0.4.20     curl_4.3.2          memoise_2.0.1      
 [61] ggplot2_3.3.6       gdtools_0.2.4       latticeExtra_0.6-29 tseries_0.10-51     desc_1.4.1         
 [66] pcaPP_2.0-1         foreach_1.5.2       TTR_0.24.3          pkgbuild_1.3.1      hdrcde_3.4         
 [71] shape_1.4.6         Rdpack_2.3.1        rlang_1.0.2         pkgconfig_2.0.3     systemfonts_1.0.4  
 [76] bitops_1.0-7        pracma_2.3.8        evaluate_0.15       fda_6.0.3           lattice_0.20-45    
 [81] purrr_0.3.4         htmlwidgets_1.5.4   ks_1.13.5           processx_3.5.3      tidyselect_1.1.2   
 [86] deSolve_1.32        magrittr_2.0.3      R6_2.5.1            generics_0.1.2      pillar_1.7.0       
 [91] withr_2.5.0         xts_0.12.1          survival_3.3-1      RCurl_1.98-1.6      nnet_7.3-17        
 [96] tibble_3.1.7        crayon_1.5.1        KernSmooth_2.23-20  utf8_1.2.2          plotly_4.10.0      
[101] rmarkdown_2.14      jpeg_0.1-9          fds_1.8             grid_4.2.0          data.table_1.14.2  
[106] callr_3.7.0         forecast_8.16       svd_0.5.1           xtable_1.8-4        Rssa_1.0.4         
[111] tidyr_1.2.0         httpuv_1.6.5        munsell_0.5.0       glmnet_4.1-4        viridisLite_0.4.0  
[116] sessioninfo_1.2.2   quadprog_1.5-8     
> La_version()
[1] "3.10.0"
> library(Rfssa)
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData")
> svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

SzymonNowakowski avatar Jul 08 '22 14:07 SzymonNowakowski

More information:

I still couldn't reproduce this problem on Windows.

On my Linux machine, using your data https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData or, equivalently, https://github.com/Reference-LAPACK/lapack/files/9059133/crashes_svd3.zip:

  • I can generate D and E (bidiagonal form) using LAPACK 3.9.1 and LAPACK 3.10.0
  • dbdsdc from LAPACK 3.9.1 does not converge using D and E from LAPACK 3.9.1
  • dbdsdc from LAPACK 3.9.1 converges using D and E from LAPACK 3.10.0
  • dbdsdc from LAPACK 3.10.0 converges using D and E from LAPACK 3.10.0
  • dbdsdc from LAPACK 3.10.0 converges using D and E from LAPACK 3.10.0

weslleyspereira avatar Aug 11 '22 23:08 weslleyspereira

Thank you so much for your work with this. So basically you are saying that (apart from my oldish Windows 7 machine) on LAPACK 3.10.0 it is a non-issue.

After I am back from my vacations in Sep I will upgrade one of the Linux machines and rerun full-scale tests but with LAPACK 3.10.0, we'll see if no new problematic matrices pop out. It was the case with OpenBLAS workaround earlier.

SzymonNowakowski avatar Aug 12 '22 17:08 SzymonNowakowski

Thank you so much for your work with this. So basically you are saying that (apart from my oldish Windows 7 machine) on LAPACK 3.10.0 it is a non-issue.

I just couldn't reproduce your problem in my machine, and had no time to try it on Windows.

After I am back from my vacations in Sep I will upgrade one of the Linux machines and rerun full-scale tests but with LAPACK 3.10.0, we'll see if no new problematic matrices pop out. It was the case with OpenBLAS workaround earlier.

That would be very helpful. Thanks! It would also be helpful if you could isolate in a file one matrix where your problem appear. Or if you could test this matrix out of R, using dgesdd directly from LAPACK 3.10.0 or more recent versions.

weslleyspereira avatar Aug 12 '22 21:08 weslleyspereira

Dear Weslley,

Previously identified matrices are OK in LAPACK 3.10.3

I upgraded a Linux Machine with Ubuntu 20.04 LTS to the newest LAPACK 3.10.3. I confirm that the problematic matrices pass successfully throughout an SVD() call. This I have confirmed for all previously considered matrices: crashes_svd.RData, crashes_svd_1.RData, crashes_svd_2.RData and crashes_svd_with_Open_BLAS.RData from my github data folder.

New problematic matrices - problem persists

However, as agreed, I've also rerun the full scale tests. Very quickly (within roughly a promile of all runs) a new problematic matrix was identified. I uploaded it to my github data folder as crashes_svd_with_LAPACK_3.10.3.RData as well.

> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rfssa_2.1.0  dplyr_1.0.10

loaded via a namespace (and not attached):
 [1] tseries_0.10-51    svd_0.5.1          httr_1.4.4         tidyr_1.2.1
 [5] viridisLite_0.4.1  jsonlite_1.8.0     splines_4.2.1      rainbow_3.6
 [9] hdrcde_3.4         shiny_1.7.2        assertthat_0.2.1   TTR_0.24.3
[13] pillar_1.8.1       lattice_0.20-45    glue_1.6.2         quadprog_1.5-8
[17] digest_0.6.29      promises_1.2.0.1   colorspace_2.0-3   htmltools_0.5.3
[21] httpuv_1.6.6       Matrix_1.4-1       pcaPP_2.0-1        timeDate_4021.104
[25] pkgconfig_2.0.3    fda_6.0.5          purrr_0.3.4        xtable_1.8-4
[29] mvtnorm_1.1-3      scales_1.2.1       RSpectra_0.16-1    later_1.3.0
[33] pracma_2.3.8       tibble_3.1.8       generics_0.1.3     ggplot2_3.3.6
[37] ellipsis_0.3.2     urca_1.3-0         nnet_7.3-17        lazyeval_0.2.2
[41] cli_3.4.0          quantmod_0.4.20    magrittr_2.0.3     mime_0.12
[45] mclust_5.4.9       forecast_8.16      fds_1.8            ks_1.13.5
[49] fansi_1.0.3        nlme_3.1-159       MASS_7.3-58.1      xts_0.12.1
[53] tools_4.2.1        data.table_1.14.2  lifecycle_1.0.2    plotly_4.10.0
[57] munsell_0.5.0      cluster_2.1.4      compiler_4.2.1     rlang_1.0.5
[61] grid_4.2.1         RCurl_1.98-1.8     htmlwidgets_1.5.4  bitops_1.0-7
[65] gtable_0.3.1       fracdiff_1.5-1     deSolve_1.32       DBI_1.1.3
[69] curl_4.3.2         R6_2.5.1           zoo_1.8-10         fastmap_1.1.0
[73] utf8_1.2.2         Rssa_1.0.5         KernSmooth_2.23-20 parallel_4.2.1
[77] Rcpp_1.0.9         vctrs_0.4.1        tidyselect_1.1.2   lmtest_0.9-40
> library(Rfssa)
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_with_LAPACK_3.10.3.RData")
> svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

Summary of my findings so far

A family of numerically unstable matrices was identified as far as SVD is concerned. Each individual version of the algorithm (Lapack+BLAS 3.9.0, OpenBLAS, Lapack+BLAS 3.10.3) handles those matrices a bit differently, it seems, so it is able to handle the matrices that are problematic to other versions.

However, a fundamental reason seems to have remained so far unsolved, and I have provided matrices causing problems for each of those 3 versions.

Unfortunately it is unclear for me how to isolate the matrix that gets passed directly into dgesdd within the svd() call.

SzymonNowakowski avatar Sep 12 '22 13:09 SzymonNowakowski

The core of my work lately is with large matrices being close to ill-defined. Thus I have been working with another dataset lately (airbnb dataset) and came upon a very similar behaviour in terms of SVD.

This may or may not be related to the problem discussed in this thread. The similarity of behaviour convinced me to report it here. This dataset is completely unrelated, so the below matrices would be completely independent from the previous matrices, sheding possibly some new light upon the problem considered in this issue.

I have isolated and uploaded two matrices:

  • https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.9.0.RData fails svd calculation in LAPACK 3.9.0, but is successful in LAPACK 3.10.3. It is of full rank:
       > load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.9.0.RData")
       > dim(crashes_svd)
       [1] 2498 2385
       > qr(crashes_svd)$rank
       [1] 2385
    
  • https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.10.3.RData fails svd calculation in LAPACK 3.10.3, but is successful in LAPACK 3.9.0. It is of full rank:
       > load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.10.3.RData")
       > dim(crashes_svd)
       [1] 2498 2406
       > qr(crashes_svd)$rank
       [1] 2406
    

Reproducing the issue

LAPACK 3.9.0

> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rfssa_2.1.0  dplyr_1.0.10

loaded via a namespace (and not attached):
 [1] tseries_0.10-51    svd_0.5.2          httr_1.4.4         tidyr_1.2.1
 [5] jsonlite_1.8.0     viridisLite_0.4.1  splines_4.2.1      rainbow_3.6
 [9] hdrcde_3.4         shiny_1.7.2        TTR_0.24.3         pillar_1.8.1
[13] lattice_0.20-45    glue_1.6.2         quadprog_1.5-8     digest_0.6.29
[17] promises_1.2.0.1   colorspace_2.0-3   htmltools_0.5.3    httpuv_1.6.6
[21] Matrix_1.5-1       pcaPP_2.0-2        timeDate_4021.104  pkgconfig_2.0.3
[25] fda_6.0.5          purrr_0.3.4        xtable_1.8-4       mvtnorm_1.1-3
[29] scales_1.2.1       RSpectra_0.16-1    later_1.3.0        pracma_2.3.8
[33] tibble_3.1.8       generics_0.1.3     ggplot2_3.3.6      ellipsis_0.3.2
[37] urca_1.3-3         nnet_7.3-17        lazyeval_0.2.2     cli_3.4.0
[41] quantmod_0.4.20    magrittr_2.0.3     mime_0.12          mclust_5.4.10
[45] forecast_8.17.0    fds_1.8            ks_1.13.5          fansi_1.0.3
[49] nlme_3.1-159       MASS_7.3-58.1      xts_0.12.1         tools_4.2.1
[53] data.table_1.14.2  lifecycle_1.0.2    plotly_4.10.0      munsell_0.5.0
[57] cluster_2.1.4      compiler_4.2.1     rlang_1.0.5        grid_4.2.1
[61] RCurl_1.98-1.8     htmlwidgets_1.5.4  bitops_1.0-7       gtable_0.3.1
[65] fracdiff_1.5-1     deSolve_1.33       curl_4.3.2         R6_2.5.1
[69] zoo_1.8-10         fastmap_1.1.0      utf8_1.2.2         Rssa_1.0.5
[73] KernSmooth_2.23-20 parallel_4.2.1     Rcpp_1.0.9         vctrs_0.4.1
[77] tidyselect_1.1.2   lmtest_0.9-40
> library(Rfssa)
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.9.0.RData")
> svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.10.3.RData")
> res<-svd(crashes_svd)
> res$d[1:10]
 [1] 50.04002 50.03009 50.03009 50.03009 50.03009 50.03009 50.02962 50.02006
 [9] 50.01996 50.01004

LAPACK 3.10.3

> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.2.1
> library(Rfssa)
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.9.0.RData")
> res<-svd(crashes_svd)
> res$d[1:10]
 [1] 50.08997 50.03009 50.02994 50.02006 50.02006 50.02006 50.02006 50.01983
 [9] 50.01004 50.01004
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_airbnb_in_LAPACK_3.10.3.RData")
> svd(crashes_svd)
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

SzymonNowakowski avatar Sep 15 '22 19:09 SzymonNowakowski

Thanks for the follow up! I will repeat my tests on those matrices and see if I can get any sense of what is happening.

weslleyspereira avatar Oct 04 '22 18:10 weslleyspereira

I came across a similar behaviour with a different matrix. In R, I am doing the following:

base::La.svd(my.mat, nu = 0)

and get:

Error in La.svd(x = my.mat, nu = 0) : error code 1 from Lapack routine ‚dgesdd'

This occurs on my machine, with R sessionInfo():

R version 4.3.0 (2023-04-21) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.0.1

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.3.0

However, the same code works on a different machine, with R sessionInfo():

R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.0.5 tools_4.0.5

and Lapack Version 3.9.0

I attached my.mat as a .csv file for reference.

my.mat.csv

abdirizak-the-trickster avatar Jul 17 '23 13:07 abdirizak-the-trickster

For the sake of information, I have tried the test code using MKL, and it worked fine!

igorkf avatar Feb 19 '24 23:02 igorkf

For the sake of information, I have tried the test code using MKL, and it worked fine!

Thank you - "the test code" here being which of the several matrices that got posted in this issue thread over the past (almost) two years ? A lot here seems to depend on tiny numerical differences between code changes, making some versions more likely to handle a given matrix, but so far nobody has confidently identified the (or a) vulnerable part of the algorithm

martin-frbg avatar Feb 20 '24 10:02 martin-frbg

For the sake of information, I have tried the test code using MKL, and it worked fine!

Thank you - "the test code" here being which of the several matrices that got posted in this issue thread over the past (almost) two years ? A lot here seems to depend on tiny numerical differences between code changes, making some versions more likely to handle a given matrix, but so far nobody has confidently identified the (or a) vulnerable part of the algorithm

I was mentioning one of the first comments:

library(Rfssa)
load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_1.RData")
svd(crashes_svd)
#Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
dim(crashes_svd)
# [1] 5344  255
qr(crashes_svd)$rank
# [1] 255

igorkf avatar Feb 20 '24 15:02 igorkf

I need to mention that my matrix is probably ill-specified. It originated due to a bug in my code. It could well be that this bug led to a numerical instability in the La.svd().

abdirizak-the-trickster avatar Feb 25 '24 10:02 abdirizak-the-trickster

Thank you. I installed OpenBlas on the Ubuntu machine and I confirm that this workaround works for all 3 isolated cases. For me it is sufficient as I can push my research ahead. Great thanks!

@SzymonNowakowski I am having the same problems, in order to fix the crashing, do you therefore recommend trying to use the OpenBLAS library. Thanks.

DylanDijk avatar Apr 30 '24 13:04 DylanDijk