HIP backend general issue
This issue is meant to centralize issues and work being done to integrate the HIP backend in Kokkos-Kernels. Ideally I would like other issues to be opened for specific technical issues to be opened and then referenced here so that users and developers would know what the known issues are and who is working on them.
Here is a list of the current issues observed while building with the HIP backend:
- [x] Kokkos_ArithTraits
long doublespecialization, see issue #807, PR #809 and PR #844 - [x] KokkosBatched
Algo::Level3::Blocked::mb()is not defined, see issue #808 and PR #812 - [x] Parallel Range (Kokkos core issue), some parameters need to be casted for template deduction, see Kokkos: issue #3386 and PR #3393
- [x] unit-tests CMakeList needs to be edited to add logic for HIP testing and ETI, see issue #819 and PR #820, PR #841
- [x] add logic in
cm_generate_makefileto support HIP builds, see PR #818 - [x] add logic in
test_all_sandiato allow spot_check on caraway (AMD/HIP platform), see PR #842 - [x] add CMake logic to disable unit-test categories selectively, see PR #822
- [x] clean-up logic in code in the
execution_space=Kokkos::Experimental::HIPpath, see PR #828 and PR #840
Now that the ETI and tests are merged (or are about to be), we can make a list of what still needs to be done to get the backend fully functional.
HIP spot-check enabled tests
- [x] BLAS
- [x] batchedDLA
- [ ] Sparse
- [ ] Graph
- [x] Common
HIP tests currently failing
Issues in batchedDLA
- batched_scalar_team_trsm_l_u_nt_n_dcomplex_dcomplex fails with a bunch of values == 0 which seems to indicate a memory issue with complex?
- batched_scalar_team_trsm_l_u_t_n_dcomplex_dcomplex aborts on
Memory access fault by GPU - batched_scalar_team_trsm_l_u_nt_n_dcomplex_double same as
dcomplex_dcompleversion - batched_scalar_team_trsm_l_u_t_n_dcomplex_double same as
dcomplex_dcompleversion - batched_scalar_teamvector_qr_with_columnpivoting_double aborts on
Device::callbackQueue aborting with status: 0x29 - batched_scalar_teamvector_solve_utv_double aborts on
Memory access fault by GPU - batched_scalar_teamvector_solve_utv2_double aborts on
Memory access fault by GPUafter failing with values == 0 - batched_scalar_teamvector_utv_double aborts on
Memory access fault by GPU
Issues in Graph (offset==int and offset==size_t fail in the same way)
- graph_graph_color_double_int_int_TestExecSpace aborts on
Memory access fault by GPU - graph_graph_color_distance2_double_int_int_TestExecSpace aborts on
Memory access fault by GPU - graph_graph_color_deterministic_double_int_int_TestExecSpace aborts on
Device::callbackQueue aborting with status: 0x1016
Issues in Sparse (offset==int and offset==size_t fail in the same way)
- sparse_gauss_seidel_asymmetric_rank1_kokkos_complex_double_int_int_TestExecSpace aborts on
Memory access fault by GPU, Note: same happens with rank2 and/or symmetric tests - sparse_balloon_clustering_double_int_int_TestExecSpace aborts on
Memory access fault by GPU, Note: happens randomly so quick possibly related to race condition? - sparse_replaceSumIntoLonger_double_int_int_TestExecSpace fails with
values == 0 - sparse_replaceSumIntoLonger_kokkos_complex_double_int_int_TestExecSpace aborts on
Device::callbackQueue aborting with status: 0x1016 - sparse_replaceSumInto_kokkos_complex_double_int_int_TestExecSpace aborts on
Device::callbackQueue aborting with status: 0x1016 - sparse_spgemm_jacobi_kokkos_complex_double_int_size_t_TestExecSpace aborts on
Device::callbackQueue aborting with status: 0x29 - sparse_spmv_kokkos_complex_double_int_int_TestExecSpace aborts on
Device::callbackQueue aborting with status: 0x1016 - sparse_spmv_mv_kokkos_complex_double_int_int_LayoutLeft_TestExecSpace aborts on
Device::callbackQueue aborting with status: 0x1016
@lucbv I'll add amd/caraway options for the testing scripts this week
Thanks, I have shared my current configuration on the internal repo (see the Technical tips section on the homepage). One thing that I need to do is ask what extra flags are used by Kokkos for AMG builds, currently I removed all the warning/error flags as Kokkos would not build otherwise.
@lucbv I have a branch now that passes unit tests for CUDA, Serial, OpenMP but will (hopefully) also work on HIP when then unit tests are built for it. The only things still hardcoded for CUDA are things involving cusparse, cublas, graphs and streams. There are a couple places where __CUDA_ARCH__ is used but that is still defined for HIP so it should be OK.
@brian-kelley thanks for looking at this, I am still waiting on rocm/3.8.0 tests to move with the ETI/tests PR as I feel it might fix quite a few things. Hopefully I can get that done next week but I'm not sure.
If your PR is ready feel free to put me as a reviewer, I will finish my review of the coarsening PR this weekend.
Using the latest rocm LLVM compiler the new list of failing tests is much shorter:
Graph
[ RUN ] hip.graph_graph_color_deterministic_double_int_int_TestExecSpace :0:rocdevice.cpp :2325: 378970770383 us: Device::callbackQueue aborting with status: 0x1016 Aborted (core dumped) [ RUN ] hip.graph_graph_color_double_int_size_t_TestExecSpace :0:rocdevice.cpp :2325: 379268378835 us: Device::callbackQueue aborting with status: 0x1016 Aborted (core dumped)
Sparse
Some failures related to complex atomics, updates in Kokkos Core should resolve these issues.
More things are working now - with rocm 4.5 and MI100 (on Caraway) all tests pass except for structured SpMV (hip.sparse_spmv_struct_double_int_size_t_TestExecSpace).
At this point we are testing HIP in our CI, everything is building correct : )