guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Crash on AMD with llama.cpp + hipblas

Open feffy380 opened this issue 11 months ago • 0 comments

The bug Trying to use a model through llama.cpp built with hipBLAS leads to an immediate crash:

CUDA error: shared object initialization failed
  current device: 0, in function ggml_cuda_op_mul_mat at /tmp/pip-req-build-di0qzh7t/vendor/llama.cpp/ggml-cuda.cu:9663
  hipGetLastError()
GGML_ASSERT: /tmp/pip-req-build-di0qzh7t/vendor/llama.cpp/ggml-cuda.cu:258: !"CUDA error"
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Unfortunately, the lack of a stack trace means I'm not sure where to even begin looking.

I am able to run the model directly through llama.cpp and llama-cpp-python. Running with AMD_LOG_LEVEL=1 shows a bunch of errors:

:1:hip_code_object.cpp      :616 : 3023512404 us: [pid:41000 tid:0x72b6649ac740] Cannot find the function: Cijk_Alik_Bljk_HB_MT32x32x32_MI16x16x16x1_SN_1LDSB1_APM1_ABV0_ACED0_AF0EM1_AF1EM1_AMAS0_ASE_ASGT_ASLT_ASAE01_ASCE01_ASEM1_AAC0_BL1_BS1_DTLA0_DTLB0_DTVA0_DTVB0_DVO0_ETSP_EPS1_FSSC10_FL0_GRPM1_GRVW8_GSU1_GSUASB_GLS0_ISA1100_IU2_K1_KLA_LBSPP128_LPA8_LPB8_LDL1_LRVW16_LWPMn1_LDW0_FMA_MIAV1_MDA2_MO40_MMFGLC_NTA0_NTB0_NTC0_NTD0_NEPBS0_NLCA1_NLCB1_ONLL1_OPLV0_PK0_PAP0_PGR1_PLR1_SIA2_SS1_SU0_SUM0_SUS0_SCIUI1_SPO0_SRVW0_SSO0_SVW1_SNLL0_TSGRA0_TSGRB0_TT1_16_TLDS1_UMLDSA1_UMLDSB1_USFGROn1_VAW2_VSn1_VW1_WSGRA1_WSGRB1_WS32_WG32_4_1_WGM8 
:1:hip_module.cpp           :83  : 3023512417 us: [pid:41000 tid:0x72b6649ac740] Cannot find the function: Cijk_Alik_Bljk_HB_MT32x32x32_MI16x16x16x1_SN_1LDSB1_APM1_ABV0_ACED0_AF0EM1_AF1EM1_AMAS0_ASE_ASGT_ASLT_ASAE01_ASCE01_ASEM1_AAC0_BL1_BS1_DTLA0_DTLB0_DTVA0_DTVB0_DVO0_ETSP_EPS1_FSSC10_FL0_GRPM1_GRVW8_GSU1_GSUASB_GLS0_ISA1100_IU2_K1_KLA_LBSPP128_LPA8_LPB8_LDL1_LRVW16_LWPMn1_LDW0_FMA_MIAV1_MDA2_MO40_MMFGLC_NTA0_NTB0_NTC0_NTD0_NEPBS0_NLCA1_NLCB1_ONLL1_OPLV0_PK0_PAP0_PGR1_PLR1_SIA2_SS1_SU0_SUM0_SUS0_SCIUI1_SPO0_SRVW0_SSO0_SVW1_SNLL0_TSGRA0_TSGRB0_TT1_16_TLDS1_UMLDSA1_UMLDSB1_USFGROn1_VAW2_VSn1_VW1_WSGRA1_WSGRB1_WS32_WG32_4_1_WGM8 for module: 0xfb0674d0 
# this goes on for several hundred more lines

To Reproduce llama-cpp-python built with

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/abetlen/llama-cpp-python
from guidance import models, gen
lm = models.LlamaCpp("./Mixtral-8x7B-Instruct-v0.1-Q4_K_M.gguf", n_gpu_layers=20, n_ctx=8192)
print(lm + "The answer to life, the universe, and everything is" + gen(stop="."))

System info (please complete the following information):

  • OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Arch Linux
  • Guidance Version (guidance.__version__): 0.1.11 (latest git head 55d8e6f)
  • ROCM Version: 6.0.0
  • GPU: 7900 XTX (gfx1100)

feffy380 avatar Mar 13 '24 03:03 feffy380