arpack-ng icon indicating copy to clipboard operation
arpack-ng copied to clipboard

MATRIX_MARKET tests failure with parallel make -jN check

Open rathann opened this issue 2 years ago • 5 comments

Expected behavior

All tests complete successfully.

Actual behavior

Two out of three from arpackmm, issue215 and issue401 tests fail if run with make -j2 or higher.

Where/how to reproduce the problem

  • arpack-ng: 3.9.1
  • OS: Fedora rawhide (but reproducible on 38 and 39, too)
  • compiler: gcc version 13.2.1 20231011 (Red Hat 13.2.1-4) (GCC)
  • environment: FFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/lib64/gfortran/modules '
  • configure: ./configure --build=x86_64-redhat-linux --host=x86_64-redhat-linux --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-static --with-blas=-lflexiblas --with-lapack=-lflexiblas --enable-eigen --enable-icb

Steps to reproduce the problem

cd EXAMPLES/MATRIX_MARKET
make check -j2
make check -j3

Error message

With make -j2 issue215 test passes and the other two fail.

$ make check -j2
make  arpackmm \
  arpackmm.sh issue401.sh issue215.sh An.mtx As.mtx Az.mtx B.mtx Bz.mtx issue401.mtx issue215.mtx
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: 'arpackmm' is up to date.
make[1]: Nothing to be done for 'arpackmm.sh'.
make[1]: Nothing to be done for 'issue401.sh'.
make[1]: Nothing to be done for 'issue215.sh'.
make[1]: Nothing to be done for 'An.mtx'.
make[1]: Nothing to be done for 'As.mtx'.
make[1]: Nothing to be done for 'Az.mtx'.
make[1]: Nothing to be done for 'B.mtx'.
make[1]: Nothing to be done for 'Bz.mtx'.
make[1]: Nothing to be done for 'issue401.mtx'.
make[1]: Nothing to be done for 'issue215.mtx'.
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make  check-TESTS
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[2]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
FAIL: issue401.sh
FAIL: arpackmm.sh
PASS: issue215.sh
============================================================================
Testsuite summary for ARPACK-NG 3.9.1
============================================================================
# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0
============================================================================
See EXAMPLES/MATRIX_MARKET/test-suite.log
Please report to https://github.com/opencollab/arpack-ng/issues/
============================================================================
make[2]: *** [Makefile:741: test-suite.log] Error 1
make[2]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: *** [Makefile:849: check-TESTS] Error 2
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make: *** [Makefile:936: check-am] Error 2

With make -j3 or higher, arpackmm test passes and the other two fail:

$ make check -j3
make  arpackmm \
  arpackmm.sh issue401.sh issue215.sh An.mtx As.mtx Az.mtx B.mtx Bz.mtx issue401.mtx issue215.mtx
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: 'arpackmm' is up to date.
make[1]: Nothing to be done for 'arpackmm.sh'.
make[1]: Nothing to be done for 'issue401.sh'.
make[1]: Nothing to be done for 'issue215.sh'.
make[1]: Nothing to be done for 'An.mtx'.
make[1]: Nothing to be done for 'As.mtx'.
make[1]: Nothing to be done for 'Az.mtx'.
make[1]: Nothing to be done for 'B.mtx'.
make[1]: Nothing to be done for 'Bz.mtx'.
make[1]: Nothing to be done for 'issue401.mtx'.
make[1]: Nothing to be done for 'issue215.mtx'.
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make  check-TESTS
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[2]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
FAIL: issue215.sh
FAIL: issue401.sh
PASS: arpackmm.sh
============================================================================
Testsuite summary for ARPACK-NG 3.9.1
============================================================================
# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0
============================================================================
See EXAMPLES/MATRIX_MARKET/test-suite.log
Please report to https://github.com/opencollab/arpack-ng/issues/
============================================================================
make[2]: *** [Makefile:741: test-suite.log] Error 1
make[2]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: *** [Makefile:849: check-TESTS] Error 2
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make: *** [Makefile:936: check-am] Error 2

Traces

make -j2

$ tail -n 300 /builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================
   ARPACK-NG 3.9.1: EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================

# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: arpackmm.sh
=================

./arpackmm --help

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150     --nbCV 6 --maxIt 200 --verbose 3 --debug 3

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150     --nbCV 6 --maxIt 200 --verbose 3 --debug 3 --restart

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150  --simplePrec   --nbCV 6 --maxIt 200 --verbose 3 --debug 3

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150  --simplePrec   --nbCV 6 --maxIt 200 --verbose 3 --debug 3 --restart
FAIL arpackmm.sh (exit status: 1)

FAIL: issue401.sh
=================

OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart no

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0 s

STAT: total number of user OP*x operation                         9
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             4
STAT: total number of it. refinement steps in reorthogonalization 8
STAT: total number of restart steps                               3
OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0 s

STAT: total number of user OP*x operation                         10
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             5
STAT: total number of it. refinement steps in reorthogonalization 10
STAT: total number of restart steps                               4
OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s
Error: bad dim - restart KO
Error: bad restart (resid)
Error: arpack solve KO
Error: solve KO
Error: arpack solve KO
FAIL issue401.sh (exit status: 1)

make -j3

$ tail -n 300 /builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================
   ARPACK-NG 3.9.1: EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================

# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: issue401.sh
=================

OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart no

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0.001 s

STAT: total number of user OP*x operation                         9
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             4
STAT: total number of it. refinement steps in reorthogonalization 8
STAT: total number of restart steps                               3
OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s
Error: bad dim - restart KO
Error: bad restart (resid)
Error: arpack solve KO
Error: solve KO
Error: arpack solve KO
FAIL issue401.sh (exit status: 1)

FAIL: issue215.sh
=================

OPT: A issue215.mtx, B N.A., dense no, nbEV 1, nbCV 4, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LM
OPT: shiftReal yes, sigmaReal 0.1, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart no

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0.001 s

STAT: total number of user OP*x operation                         6
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             4
STAT: total number of it. refinement steps in reorthogonalization 6
STAT: total number of restart steps                               1
OPT: A issue215.mtx, B N.A., dense no, nbEV 1, nbCV 4, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LM
OPT: shiftReal yes, sigmaReal 0.1, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s
Error: bad dim - restart KO
Error: bad restart (resid)
Error: arpack solve KO
Error: solve KO
Error: arpack solve KO
FAIL issue215.sh (exit status: 1)

Callstack

N/A

Notes, remarks

Using make -j1 or no -j option works.

rathann avatar Oct 18 '23 10:10 rathann

Is it a regression new with 3.9.1?

sylvestre avatar Oct 18 '23 12:10 sylvestre

These tests didn't exist in 3.9.0, so yes, it's new.

rathann avatar Oct 18 '23 19:10 rathann

These tests are meant to be run sequentially: restart infos are stored into a file that do not support concurrent access

fghoussen avatar Dec 06 '23 09:12 fghoussen

Ok. Could only those tests be run sequentially? make has special markers for targets that require sequential handling.

rathann avatar Dec 07 '23 17:12 rathann

Ok. Could only those tests be run sequentially?

Sure

make has special markers for targets that require sequential handling.

No idea how

fghoussen avatar Dec 07 '23 18:12 fghoussen