arpack-ng icon indicating copy to clipboard operation
arpack-ng copied to clipboard

failed regression check for 3.9.1 on macos sequoia

Open chenrui333 opened this issue 1 year ago • 7 comments

Trying to regression build 3.9.1 for macos sequoia, but ran into the following test failure

==> /opt/homebrew/bin/gfortran -o test /opt/homebrew/Cellar/arpack/3.9.1/share/arpack/dnsimp.f /opt/homebrew/Cellar/arpack/3.9.1/share/arpack/mmio.f -L/opt/homebrew/Cellar/arpack/3.9.1/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas
  ==> ./test
  Error: arpack: failed
  Error: arpack: failed
  An exception occurred within a child process:
    Minitest::Assertion: Expected /reached/ to match " ** On entry to DLASCL parameter number  4 had an illegal value\n ** On entry to DLASCL parameter number  4 had an illegal value\n  \n  Error with _naupd, info =        -9999\n  Check the documentation of _naupd\n  \n".
  /opt/homebrew/Library/Homebrew/vendor/bundle/ruby/3.3.0/gems/minitest-5.25.1/lib/minitest/assertions.rb:1[76](https://github.com/Homebrew/homebrew-core/actions/runs/10792126279/job/29932784189#step:5:77):in `assert'

full build log in here, https://github.com/Homebrew/homebrew-core/actions/runs/10792126279/job/29932784189

chenrui333 avatar Sep 16 '24 13:09 chenrui333

Not sure if I could help. What are the options of the configure? Does your lapack switched to lp64 or ilp64? Did your env changed? Can't tell!

fghoussen avatar Sep 16 '24 15:09 fghoussen

In Homebrew, we only see this behavior on macOS 15 Sequoia.

Can also confirm that installing the binaries built on macOS 14 Sonoma hits the same On entry to DLASCL parameter number 4 had an illegal value when run on macOS 15.

Tests run on ARM64 / Apple Silicon, but same behavior when testing x86_64 binaries via Rosetta.

We are running the test from https://github.com/opencollab/arpack-ng/blob/master/TESTS/dnsimp.f and it looks like we are checking for reached (I guess the maximum iterations?)


For some of your questions:

What are the options of the configure?

https://github.com/Homebrew/homebrew-core/blob/master/Formula/a/arpack.rb#L30-L36, so without variables it becomes:

      --disable-dependency-tracking
      --prefix=/opt/homebrew/Cellar/arpack/3.9.1/libexec
      --with-blas="-L/opt/homebrew/opt/openblas/lib -lopenblas"
      F77=mpif77
      --enable-mpi
      --enable-icb
      --enable-eigen

Does your lapack switched to lp64 or ilp64?

Should still be the same one included with 32-bit integer OpenBLAS

Did your env changed?

The main difference is macOS major version. Build variables, non-Apple libraries, etc should be similar across runners.

cho-m avatar Sep 30 '24 13:09 cho-m

Should still be the same one included with 32-bit integer OpenBLAS

arpack depends on BLAS / LAPACK implementations (netlib, mkl, ...): you may try with an older version of OpenBLAS. In case, the problem disappear, it may be due to the version of OpenBLAS you use.

fghoussen avatar Sep 30 '24 17:09 fghoussen

I did try some non-OpenBLAS implementations and they all hit some error so I would guess it isn't OpenBLAS-specific:

  • Accelerate linked using LIBS=-framework Accelerate
    BLAS                :
    LAPACK              :
    EIGEN               : -I/opt/homebrew/Cellar/eigen/3.4.0_1/include/eigen3
    LIBS                : -framework Accelerate
    LDADD               :
    
    ** On entry to DLASCL, parameter number  4 had an illegal value
    ** On entry to DLASCL, parameter number  4 had an illegal value
    
      Error with _naupd, info =        -9999
      Check the documentation of _naupd
    
  • NETLIB reference
    BLAS                : -L/opt/homebrew/opt/lapack/lib -lblas
    LAPACK              : -L/opt/homebrew/opt/lapack/lib -llapack
    EIGEN               : -I/opt/homebrew/Cellar/eigen/3.4.0_1/include/eigen3
    LIBS                :
    LDADD               :
    
     ** On entry to DLASCL parameter number  4 had an illegal value
    Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG
    

cho-m avatar Sep 30 '24 18:09 cho-m

Do you export these variables? https://github.com/opencollab/arpack-ng/blob/9431fd5ea66a00e696695ecbc5c6a02e1ebabca7/.github/workflows/jobs.yml#L184-L190

fghoussen avatar Sep 30 '24 19:09 fghoussen

One thing I found was running ctest in build directory passes on what I think is the same test.

dnsimp_tst
test 1
      Start  1: dnsimp_tst

1: Test command: /tmp/arpack-20240930-10914-5rwi80/arpack-ng-3.9.1/build/TESTS/dnsimp
1: Working Directory: /tmp/arpack-20240930-10914-5rwi80/arpack-ng-3.9.1/build
1: Test timeout computed to be: 10000000
1: 
1:  Ritz values (Real, Imag) and residual residuals
1:  -----------------------------------------------
1:                Col   1       Col   2       Col   3
1:   Row   1:   -1.96023D+00   2.40614D-01   5.62865D-15
1:   Row   2:   -1.96023D+00  -2.40614D-01   5.62865D-15
1:   Row   3:   -1.28819D+00   1.49056D+00   5.23259D-15
1:   Row   4:   -1.28819D+00  -1.49056D+00   5.23259D-15
1:   Row   5:   -1.66676D+00   0.00000D+00   6.47624D-15
1:   Row   6:   -1.38893D+00   8.11056D-01   4.79491D-15
1:   Row   7:   -1.38893D+00  -8.11056D-01   4.79491D-15
1:   
1:   
1:   Maximum number of iterations reached.
1:   
1:   
1:   _NSIMP 
1:   ====== 
1:   
1:   Size of the matrix is         2500
1:   The number of Ritz values requested is           11
1:   The number of Arnoldi vectors generated (NCV) is           20
1:   What portion of the spectrum: SR
1:   The number of converged Ritz values is            7
1:   The number of Implicit Arnoldi update iterations taken is           31
1:   The number of OP*x is          180
1:   The convergence criterion is    1.1102230246251565E-016
1:   
 1/13 Test  #1: dnsimp_tst .......................   Passed    0.37 sec

Comparing compilation commands for test, it looks like difference is optimization.

In Homebrew, we build the test without optimization flags. I have now confirmed that:

  • -O0 and -O1 fail
  • -O2 and -O3 pass
❯ gfortran -O0 -o test dnsimp.f mmio.f -L/opt/homebrew/opt/arpack/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas

❯ ./test
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value

  Error with _naupd, info =        -9999
  Check the documentation of _naupd

❯ gfortran -O1 -o test dnsimp.f mmio.f -L/opt/homebrew/opt/arpack/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas

❯ ./test
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value

  Error with _naupd, info =        -9999
  Check the documentation of _naupd

❯ gfortran -O2 -o test dnsimp.f mmio.f -L/opt/homebrew/opt/arpack/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas

❯ ./test

 _naupd: Number of update iterations taken
 -----------------------------------------
    1 -    1:    31


 _naupd: Number of wanted "converged" Ritz values
 ------------------------------------------------
    1 -    1:     7
...

cho-m avatar Oct 01 '24 02:10 cho-m

In -O0 does -Wall -Werror give some hints? Does adding -ffpe-trap=zero,overflow,underflow,invalid -fcheck=all give more info? You may hit overflow here https://github.com/opencollab/arpack-ng/blob/9431fd5ea66a00e696695ecbc5c6a02e1ebabca7/TESTS/dnsimp.f#L448 or https://github.com/opencollab/arpack-ng/blob/9431fd5ea66a00e696695ecbc5c6a02e1ebabca7/TESTS/dnsimp.f#L467 (potential divide by zero ?)

fghoussen avatar Oct 01 '24 18:10 fghoussen