abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

DCU cg single precision bug: Unexpected hipBLAS Error

Open pxlxingliang opened this issue 9 months ago • 0 comments

Describe the bug

I run below job with dcu abacus with precision=single.

WARNING: Total thread number on this node mismatches with hardware availability. This may cause poor performance.
Info: Local MPI proc number: 4,OpenMP thread number: 1,Total thread number: 4,Local thread limit: 32
Unexpected hipBLAS Error: Unknown /public/home/abacus/abacus-develop/source/module_hsolver/kernels/rocm/math_kernel_op.hip.cu 723
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[35024,1],0]
  Exit code:    10
--------------------------------------------------------------------------

single (2).zip

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • [ ] Verify the issue is not a duplicate.
  • [ ] Describe the bug.
  • [ ] Steps to reproduce.
  • [ ] Expected behavior.
  • [ ] Error message.
  • [ ] Environment details.
  • [ ] Additional context.
  • [ ] Assign a priority level (low, medium, high, urgent).
  • [ ] Assign the issue to a team member.
  • [ ] Label the issue with relevant tags.
  • [ ] Identify possible related issues.
  • [ ] Create a unit test or automated test to reproduce the bug (if applicable).
  • [ ] Fix the bug.
  • [ ] Test the fix.
  • [ ] Update documentation (if necessary).
  • [ ] Close the issue and inform the reporter (if applicable).

pxlxingliang avatar May 10 '24 06:05 pxlxingliang