omega_h
omega_h copied to clipboard
warp_test failure
Building 7cb85faa2f400237159e17855e7081eea5d7c7e7 with the following environment settings and cmake command got me through the build without any obvious issues on the RPI AiMOS (mini summit) system.
module use /gpfs/u/software/dcs-spack-install/v0133gccSpectrum/lmod/linux-rhel7-ppc64le/gcc/7.4.0-1/
module load spectrum-mpi/10.3-doq6u5y
module load gcc/7.4.0/1
module load \
cmake/3.15.4-mnqjvz6 \
cuda/10.2
export OMPI_CXX=g++
cmake ../omega_h/ -DCMAKE_INSTALL_PREFIX=/gpfs/u/home/MPFS/MPFSsmth/barn-shared/cws/software/build-omegah-dcs-gcc74-cuda/install -DBUILD_SHARED_LIBS=OFF -DOmega_h_USE_CUDA=on -DOmega_h_USE_MPI=on -DCMAKE_CXX_COMPILER=/opt/
ibm/spectrum_mpi/bin/mpicxx -DCMAKE_CUDA_FLAGS=-arch=sm_70 -DOmega_h_USE_Kokkos=ON -DKokkos_PREFIX=../build-kokkos-dcs-gcc74-cuda/install/lib/CMake/
-DBUILD_TESTING=ON
ctest reports failure of the warp_test_parallel test. rm -rf src/gold_warp.osh; ctest; ctest also results in the failure (discussion began in #343).
17/21 Testing: warp_test_parallel
17/21 Test: warp_test_parallel
Command: "/opt/ibm/spectrum_mpi/bin/mpirun" "-np" "2" "./warp_test"
Directory: /gpfs/u/home/MPFS/MPFSsmth/barn-shared/cws/software/build-omegah-dcs-gcc74-cuda/src
"warp_test_parallel" start time: Mar 27 09:01 EDT
Output:
----------------------------------------------------------
warp_to_limit completed in one step
before adapting:
6000 tets, quality [0.62,0.85], 6000 >0.30
7930 edges, length [0.67,1.55], 132 <0.71, 7458 in [0.71,1.41], 340 >1.41
quality histogram:
0.00-0.10: 0
0.10-0.20: 0
0.20-0.30: 0
<snip>
test took 54.2491 seconds
vertex tag "metric" values are different
max diff at vertex 1008, comp 0, values 6.327799822371828e+01 vs 6.327799822371829e+01
edge tag "length" values are different
max diff at edge 6073, comp 0, values 1.124971094950606e+00 vs 1.124971094950607e+00
tet tag "quality" values are different
max diff at region 3483, comp 0, values 7.559526299369257e-01 vs 7.559526299369259e-01
This run, stored at "gold_warp_bad.osh",
does not match the gold at "gold_warp.osh"
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[40972,1],0]
Exit code: 2
--------------------------------------------------------------------------
<end of output>
Test time = 56.95 sec
----------------------------------------------------------
Test Failed.
"warp_test_parallel" end time: Mar 27 09:02 EDT
"warp_test_parallel" time elapsed: 00:00:56
----------------------------------------------------------
Interestingly, a build without Kokkos (https://github.com/SNLComputation/omega_h/issues/344#issuecomment-606573986, using master 9a2b60d) passes all tests.