easybuild-easyconfigs
easybuild-easyconfigs copied to clipboard
{lib}[foss/2021a] AMGX v2.2.0 w/ CUDA 11.3.1
(created using eb --new-pr
)
Disclaimer: Completely untested. Waiting for a GPU node since I suspect the test suite requires it. Not sure how we want to deal with that.
@Micket: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/1299123385 Output from first failing test suite run:
FAIL: test_style_conformance (test.easyconfigs.styletests.StyleTest)
Check the easyconfigs for style
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/easyconfigs/styletests.py", line 57, in test_style_conformance
self.assertEqual(result, 0, "Found code style errors (and/or warnings): %s" % result)
AssertionError: Found code style errors (and/or warnings): 1
----------------------------------------------------------------------
Ran 13409 tests in 629.271s
FAILED (failures=1)
ERROR: Not all tests were successful
bleep, bloop, I'm just a bot (boegelbot v20200716.01)
Please talk to my owner @boegel
if you notice you me acting stupid),
or submit a pull request to https://github.com/boegel/boegelbot fix the problem.
@boegelbot please test @ generoso
@SebastianAchilles: Request for testing this PR well received on login1
PR test command 'EB_PR=14094 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_14094 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh
' executed!
- exit code: 0
- output:
Submitted batch job 7028
Test results coming soon (I hope)...
- notification for comment with ID 932983429 processed
Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).
Test report by @boegelbot FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) cnx1 - Linux rocky linux 8.4, x86_64, Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (haswell), Python 3.6.8 See https://gist.github.com/b3bf4855937bc090f4f25b79fd68e8e0 for a full test report.
(generoso has no GPUs, so, I suspect it will just fail at the test step)
(generoso has no GPUs, so, I suspect it will just fail at the test step)
Yes, that is likely. I will test it on one of the GPU nodes.
Test report by @sebastianachilles SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) jsfc060 - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7742 64-Core Processor, Python 3.6.8 See https://gist.github.com/a59d747a6013a6415c54094f3c4b59c1 for a full test report.
I'm still building magma, test report coming in later
I'm still unsure about the patch, and the if-condition (but i really didn't want to make an easyblock)
Test report by @branfosj SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) bear-pg0212u17a.bear.cluster - Linux RHEL 8.3, x86_64, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (broadwell), 1 x NVIDIA Tesla P100-PCIE-16GB, 460.32.03, Python 3.6.8 See https://gist.github.com/dc424ec585841a0263a6e9d088addfe8 for a full test report.
Test report by @akesandgren FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) b-cn1113.hpc2n.umu.se - Linux Ubuntu 20.04, x86_64, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 3.8.10 See https://gist.github.com/2837869975bd31c449802a96aa15517b for a full test report.
How did you get those builds to not fail with ICE's??? I get:
/hpc2n/eb/software/CUDA/11.3.1/bin/nvcc /scratch/eb-ake/AMGX/2.2.0/foss-2021a-CUDA-11.3.1/AMGX-2.2.0/core/src/solvers/user_solver.cu -c -o /scratch/eb-ake/AMGX/2.2.0/foss-2021a-CUDA-11.3.1/easybuild_obj/core/CMakeFiles/amgx_core.dir/src/solvers/./amgx_core_generated_user_solver.cu.o -ccbin /hpc2n/eb/software/GCCcore/10.3.0/bin/gcc -m64 -Xcompiler ,\"-O2\",\"-ftree-vectorize\",\"-march=native\",\"-fno-math-errno\",\"-fopenmp\",\"-Wno-terminate\",\"-static-libgcc\",\"-fopenmp\",\"-DRAPIDJSON_DEFINED\",\"-DAMGX_WITH_MPI\",\"-O3\",\"-DNDEBUG\" -Xcompiler=-rdynamic -Xcompiler=-fPIC -Xcompiler=-fvisibility=default -DDISABLE_MIXED_PRECISION -DCUSPARSE_GENERIC_INTERFACES -DCUSPARSE_USE_GENERIC_SPGEMM -O3 -DNDEBUG -std=c++14 --Werror cross-execution-space-call -DNVTX_RANGES -DNVCC -I/scratch/eb-ake/AMGX/2.2.0/foss-2021a-CUDA-11.3.1/AMGX-2.2.0/../../thrust -I/scratch/eb-ake/AMGX/2.2.0/foss-2021a-CUDA-11.3.1/AMGX-2.2.0/core/include -I/scratch/eb-ake/AMGX/2.2.0/foss-2021a-CUDA-11.3.1/AMGX-2.2.0/core/../base/include -I/hpc2n/eb/software/CUDA/11.3.1/include -I/scratch/eb-ake/AMGX/2.2.0/foss-2021a-CUDA-11.3.1/AMGX-2.2.0/external/rapidjson/include -I/cvmfs/ebsw.hpc2n.umu.se/amd64_ubuntu2004_bdw/software/OpenMPI/4.1.1-GCC-10.3.0/include -I/hpc2n/eb/software/OpenMPI/4.1.1-GCC-10.3.0/include
/cvmfs/ebsw.hpc2n.umu.se/amd64_ubuntu2004_bdw/software/GCCcore/10.3.0/include/c++/10.3.0/chrono: In substitution of template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]:
/cvmfs/ebsw.hpc2n.umu.se/amd64_ubuntu2004_bdw/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:473:154: required from here
/cvmfs/ebsw.hpc2n.umu.se/amd64_ubuntu2004_bdw/software/GCCcore/10.3.0/include/c++/10.3.0/chrono:428:27: internal compiler error: Segmentation fault
428 | _S_gcd(intmax_t __m, intmax_t __n) noexcept
| ^~~~~~
0xcbbdaf crash_signal
../../gcc/toplev.c:328
0x7b1e7d tsubst(tree_node*, tree_node*, int, tree_node*)
../../gcc/cp/pt.c:15310
...
@akesandgren Have you built GCCcore since https://github.com/easybuilders/easybuild-easyconfigs/pull/13310 was merged?
Perhaps not.... rebuilding it...
And yes, it was built with 4.4.0...
Test report by @akesandgren SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) b-cn1113.hpc2n.umu.se - Linux Ubuntu 20.04, x86_64, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 3.8.10 See https://gist.github.com/b93f8c1d2fcd3602b5a51d7db5fafe73 for a full test report.
Test report by @Micket SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) vera54-1 - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8 See https://gist.github.com/bae8987fed8ad81d04c2cc180adca76e for a full test report.
So, one can build
make amgx_tests_launcher
and then run that. It seems to be running things in the GPU, but, it all fails for me.
Spawning new worker
[ 1/ 183] AggregatesCoarseGeneratorTest dDDI : [PASSED]
[ 2/ 183] AggregatesCoarseGeneratorTest dDFI : [PASSED]
[ 3/ 183] AggregatesCoarseGeneratorTest dFFI : [PASSED]
[ 4/ 183] AggregatesCoarseningFactor dDDI : [FAILED]
Spawning new worker
Launching test AggregatesCoarseningFactor dDDI in the new process for the second time
[ 1/ 180] AggregatesCoarseningFactor dDDI : [FAILED]
[ 2/ 180] AggregatesCoarseningFactor dDFI : [FAILED]
Spawning new worker
Launching test AggregatesCoarseningFactor dDFI in the new process for the second time
[ 1/ 179] AggregatesCoarseningFactor dDFI : [FAILED]
[ 2/ 179] AggregatesCoarseningFactor dFFI : ^CCaught signal 2 - SIGINT (interrupt)
I stopped it after 15 hours. What do you guys think?
I've installed it and thrown it at the user who requested it.
their ci/test.sh says
# WIP: test_launcher is allowed to fail; not all tests pass
./tests/amgx_tests_launcher
so perhaps all is fine?
their ci/test.sh says
# WIP: test_launcher is allowed to fail; not all tests pass ./tests/amgx_tests_launcher
so perhaps all is fine?
Should we add this as a comment in the easyconfig and then merge this PR?
Not yet, I'm working out exactly how to run the tests, and which ones, to have a base level that we would expect to pass... I can get a lot of stuff to pass when built manually...
And as for testing, this works:
local_tests = 'AggregatesCoarseGeneratorTest AggregatesDeterminism CAPIFailure CAPIVersionCheck '
local_tests += 'ClassicalStrengthTest ConfigStringParsing CsrMultiplyTests_Poisson27_100_100 '
local_tests += 'CsrMultiplyTests_Poisson27_10_10 CsrMultiplyTests_Poisson5_100_100 CsrMultiplyTests_Poisson5_10_10 '
local_tests += 'CsrMultiplyTests_Poisson7_100_100 CsrMultiplyTests_Poisson7_10_10 CsrMultiplyTests_Poisson9_100_100 '
local_tests += 'CsrMultiplyTests_Poisson9_10_10 CsrSparsityILU1Tests_Poisson27_100_100 '
local_tests += 'CsrSparsityILU1Tests_Poisson27_10_10 CsrSparsityILU1Tests_Poisson5_100_100 '
local_tests += 'CsrSparsityILU1Tests_Poisson5_10_10 CsrSparsityILU1Tests_Poisson7_100_100 '
local_tests += 'CsrSparsityILU1Tests_Poisson7_10_10 CsrSparsityILU1Tests_Poisson9_100_100 '
local_tests += 'CsrSparsityILU1Tests_Poisson9_10_10 CsrSparsityTests_Poisson27_100_100 '
local_tests += 'CsrSparsityTests_Poisson27_10_10 CsrSparsityTests_Poisson5_100_100 CsrSparsityTests_Poisson5_10_10 '
local_tests += 'CsrSparsityTests_Poisson7_100_100 CsrSparsityTests_Poisson7_10_10 CsrSparsityTests_Poisson9_100_100 '
local_tests += 'CsrSparsityTests_Poisson9_10_10 DenseLUSolverTest_Factorization_Id_256 '
local_tests += 'DenseLUSolverTest_Factorization_Id_32 DenseLUSolverTest_Solve_Id_256 DenseLUSolverTest_Solve_Id_32 '
local_tests += 'DenseLUSolverTest_Solve_Poisson3D FGMRESConvergencePoisson FactoriesTest '
local_tests += 'GenericSpMVTest IDRConvergencePoisson '
local_tests += 'IDRmsyncConvergencePoisson LargeMatricesSupport MatrixTests MatrixVectorMultiplyTests NestedSolvers '
local_tests += 'NormTests PermuteTests RandomMatrix SmootherBlockPoissonTest '
local_tests += 'TemplateConfigTest TemplateTest VectorTests truncateCountTest '
runtest = "amgx_tests_launcher && tests/amgx_tests_launcher %s " % local_tests
Test report by @Micket FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) vera54-1 - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8 See https://gist.github.com/cbf86d23561048de81f99fc381cf41a8 for a full test report.
Test report by @akesandgren SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) b-cn1312.hpc2n.umu.se - Linux Ubuntu 20.04, x86_64, Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, Python 3.8.10 See https://gist.github.com/618157a510b446bbc16b5e42298f4211 for a full test report.
Test report by @Micket SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) vera54-1 - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8 See https://gist.github.com/98ba94ebc61baceb36c47525bc254b0c for a full test report.
Test report by @Micket SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) vera54-1 - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8 See https://gist.github.com/1b29362f54303306818d83854aadfe34 for a full test report.
AMGX 2.3.0 is already out, and revamped their CMakeLists properly, so this is outdated. This is replaced by #16560