[Issue] Solver reports wrong vector size in parallel, whereas simple SpMV is correct
Describe the issue
Using a matrix loaded with the AMGX_matrix_upload_distributed function, I obtain the following message on each MPI rank :
Caught amgx exception: Vector size too small: not enough space for halo elements.
Vector: {tag = 1, size = 139264}
Required size: 140760
at: /home/D43345/src/AMGX/src/distributed/comms_visitors3.cu:23
Stack trace:
/home/D43345/opt/amgx-2.5/arch/nvhpc-byo/lib/libamgxsh.so : ()+0x3bff620
/home/D43345/opt/amgx-2.5/arch/nvhpc-byo/lib/libamgxsh.so : amgx::ExcHalo2Functor<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2>, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >::operator()(amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&)+0x310
This is done in the code_saturne CFD code's AmgX wrapper (https://github.com/code-saturne/code_saturne/blob/master/src/alge/cs_sles_amgx.cpp), which, apart from a recent migration from C to C++, has not changed much in the last 2 years, and did work at one point 2 or 3 years ago (though possibly on a different test case). The AmgX call sequence is actually quite similar to that in PETSc (I compared the 2 looking for incorrect code on my side).
Testing the SpMV product alone prior to solving the system, everything seems good and the result is the same as that of a reference (built-in code). After stepping through here:
709 template<AMGX_Mode CASE,
710 template<typename> class SolverType,
711 template<typename> class VectorType>
712 inline AMGX_ERROR solve_with(AMGX_solver_handle slv,
713 AMGX_vector_handle rhs,
714 AMGX_vector_handle sol,
715 Resources *resources,
716 bool xIsZero = false)
717 {
718 typedef SolverType<typename TemplateMode<CASE>::Type> SolverLetterT;
719 typedef CWrapHandle<AMGX_solver_handle, SolverLetterT> SolverW;
720 typedef VectorType<typename TemplateMode<CASE>::Type> VectorLetterT;
721 typedef CWrapHandle<AMGX_vector_handle, VectorLetterT> VectorW;
722 SolverW wrapSolver(slv);
723 SolverLetterT &solver = *wrapSolver.wrapped();
724 //AMGX_STATUS& slv_stat = wrapSolver.last_solve_status(slv);
725 VectorW wrapRhs(rhs);
726 VectorLetterT &b = *wrapRhs.wrapped();
727 VectorW wrapSol(sol);
728 VectorLetterT &x = *wrapSol.wrapped();
Checking the vector sizes:
(gdb) p v_rhs.size()
$4 = 139264
(gdb) p v_x.size()
$5 = 140760
The vector size is larger than the RHS size, which accounts for the ghost cells correctly, and matches the expected sizes in the calling code.
But when solving, after stepping through here:
709 template<AMGX_Mode CASE,
710 template<typename> class SolverType,
711 template<typename> class VectorType>
712 inline AMGX_ERROR solve_with(AMGX_solver_handle slv,
713 AMGX_vector_handle rhs,
714 AMGX_vector_handle sol,
715 Resources *resources,
716 bool xIsZero = false)
717 {
718 typedef SolverType<typename TemplateMode<CASE>::Type> SolverLetterT;
719 typedef CWrapHandle<AMGX_solver_handle, SolverLetterT> SolverW;
720 typedef VectorType<typename TemplateMode<CASE>::Type> VectorLetterT;
721 typedef CWrapHandle<AMGX_vector_handle, VectorLetterT> VectorW;
722 SolverW wrapSolver(slv);
723 SolverLetterT &solver = *wrapSolver.wrapped();
724 //AMGX_STATUS& slv_stat = wrapSolver.last_solve_status(slv);
725 VectorW wrapRhs(rhs);
726 VectorLetterT &b = *wrapRhs.wrapped();
727 VectorW wrapSol(sol);
728 VectorLetterT &x = *wrapSol.wrapped();
Although the code is very similar, when I check for the RHS and solution sizes, I get the following result
(gdb) p x.size()
$3 = 139264
(gdb) p b.size()
$4 = 140760
So it seems the 2 sizes are inverted, leading to the error in the check preceding the actual ghost cell exchange.
I managed to get this far, but am not familiar enough with the wrapper mechanisms to really understand where the error comes from. But I do reproduce it on several systems (laptop and cluster).
Environment information:
- OS: Linux, reproduced on, Arch Linux Debian 12, and RHEL 8.
- CUDA runtime: 12.8 and 12.4 at least
- Native OpenMPI 5 (CUDA-aware) or 4 (non-CUDA aware)
- AMGX 666311a52c36d75f6738ecc15897840256d2cf79
- NVIDIA GPU: RTX4600 mobile, A500, V100
AMGX solver configuration
Tested both with PCG_CLASSICAL_V_JACOBI.json and no configuration.
Matrix Data
code_saturne's BUNDLE test case (https://github.com/code-saturne/saturne-open-cases/tree/main/BUNDLE), cases BENCH_C16_04 (coarse) on 2 MPI ranks can be adapted for this, but this requires building the code and positioning the following environment variables.
CS_BENCH_NO_PERIODICITY=1
CS_BENCH_SLES_TYPE=amgx
I do not have an easy/standard export mechanism for this system.
Reproduction steps
If your AMGX workflow differs from one of AMGX examples - provide minimal reproducible example for the reported issue
Additional context
Add any other context about the problem here.
Follow-up: in the calling code, uploading the solution vector to a size matching the number of columns (instead of rows), seems to work around this.
It is strange though that copying halo data which will be overwritten anyways when synced should be necessary, so an API function allowing the user to provide a hint for the correct size would be nice.