abacus-develop
abacus-develop copied to clipboard
DCU cg single precision bug: Unexpected hipBLAS Error
Describe the bug
I run below job with dcu abacus with precision=single.
WARNING: Total thread number on this node mismatches with hardware availability. This may cause poor performance.
Info: Local MPI proc number: 4,OpenMP thread number: 1,Total thread number: 4,Local thread limit: 32
Unexpected hipBLAS Error: Unknown /public/home/abacus/abacus-develop/source/module_hsolver/kernels/rocm/math_kernel_op.hip.cu 723
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[35024,1],0]
Exit code: 10
--------------------------------------------------------------------------
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
- [ ] Verify the issue is not a duplicate.
- [ ] Describe the bug.
- [ ] Steps to reproduce.
- [ ] Expected behavior.
- [ ] Error message.
- [ ] Environment details.
- [ ] Additional context.
- [ ] Assign a priority level (low, medium, high, urgent).
- [ ] Assign the issue to a team member.
- [ ] Label the issue with relevant tags.
- [ ] Identify possible related issues.
- [ ] Create a unit test or automated test to reproduce the bug (if applicable).
- [ ] Fix the bug.
- [ ] Test the fix.
- [ ] Update documentation (if necessary).
- [ ] Close the issue and inform the reporter (if applicable).