llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Running llamafile with graphics support crashes on PC with Radeon RX 6600

Open nameiwillforget opened this issue 11 months ago • 1 comments

I tried this using different llamafiles, but it always goes like this:

[alex@Arch bin]$ sh mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile -ngl 9999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: /opt/rocm/bin/amdclang++ does not exist
link_cuda_dso: note: dynamically linking /home/alex/.llamafile/ggml-rocm.so
mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.0/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.

error: Uncaught SIGABRT (SI_TKILL) at 0x3e80004d538 on Arch pid 316728 tid 316728
  /home/alex/.local/bin/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile
  File exists
  Linux Cosmopolitan 3.2.4 MODE=x86_64; #1 SMP PREEMPT_DYNAMIC Fri, 23 Feb 2024 16:31:48 +0000 Arch 6.7.6-arch1-1

RAX 0000000000000000 RBX 000000000004d538 RDI 000000000004d538
RCX 000077050af3b32c RDX 0000000000000006 RSI 000000000004d538
RBP 000077050aeab740 RSP 00007ffc1d9b4d10 RIP 000077050af3b32c
 R8 000000007fa81010  R9 0000000000000007 R10 0000000000000008
R11 0000000000000246 R12 00007704b6767558 R13 0000000000000006
R14 00007704b67679e0 R15 0000000000000000
TLS 0000000000746340

XMM0  7453746e65727275633a3a734f60206e XMM8  000000007fa86b00000000007fa848c0
XMM1  63656863207473756a22202626206573 XMM9  00000000000000000000000000000000
XMM2  6a2220262620657361622a203c202928 XMM10 00000000000000000000000000000000
XMM3  0000000000000000000a2e64656c6961 XMM11 00000000000000000000000000000000
XMM4  3a534f2026262065633a3a2a202d2065 XMM12 00000000000000000000000000000000
XMM5  ffffffffffffffffff00000000000000 XMM13 00000000000000000000000000000000
XMM6  00000000000000000000000000000000 XMM14 00000000000000000000000000000000
XMM7  00000000000000000000000000000000 XMM15 00000000000000000000000000000000

cosmoaddr2line /home/alex/.local/bin/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile 77050af3b32c 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0

7ffc1d9b1b70 77050af3b32c NULL+0
<dangerous frame>

10008004-10008006 rw-pa-       3x automap 192kB w/ 64kB hole
10008008-10008011 rw-pa-      10x automap 640kB w/ 14gB hole
10040060-100b8308 r--s-- 492'201x automap 30gB w/ 96tB hole
6fd00004-6fd0000c rw-paF       9x zipos 576kB w/ 64gB hole
6fe00004-6fe00004 rw-paF       1x g_fds 64kB
# 30gB total mapped memory
/home/alex/.local/bin/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile -m mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf -c 0 -ngl 9999 
Aborted (core dumped)
[alex@Arch bin]$  

What's going wrong?

nameiwillforget avatar Mar 06 '24 18:03 nameiwillforget

Need to install rocm hip sdk first. I successfully ran it in the local rx6600 environment of Win10

D:\Temp>llava-v1.5-7b-q4.llamafile.exe -ngl 999
initializing gpu module...
extracting /zip/llama.cpp/ggml.h to C:\Users\levin/.llamafile/ggml.h
extracting /zip/llamafile/compcap.cu to C:\Users\levin/.llamafile/compcap.cu
extracting /zip/llamafile/llamafile.h to C:\Users\levin/.llamafile/llamafile.h
extracting /zip/llamafile/tinyblas.h to C:\Users\levin/.llamafile/tinyblas.h
extracting /zip/llamafile/tinyblas.cu to C:\Users\levin/.llamafile/tinyblas.cu
extracting /zip/llama.cpp/ggml-impl.h to C:\Users\levin/.llamafile/ggml-impl.h
extracting /zip/llama.cpp/ggml-cuda.h to C:\Users\levin/.llamafile/ggml-cuda.h
extracting /zip/llama.cpp/ggml-alloc.h to C:\Users\levin/.llamafile/ggml-alloc.h
extracting /zip/llama.cpp/ggml-backend.h to C:\Users\levin/.llamafile/ggml-backend.h
extracting /zip/llama.cpp/ggml-backend-impl.h to C:\Users\levin/.llamafile/ggml-backend-impl.h
extracting /zip/llama.cpp/ggml-cuda.cu to C:\Users\levin/.llamafile/ggml-cuda.cu
"/C/Program Files/AMD/ROCm/5.7//bin/clang++.exe" -fuse-ld=lld -shared -nostartfiles -nostdlib -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-ignored-attributes -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_HIPBLAS -DIGNORE -DK_QUANTS_PER_ITERATION=2 -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -isystem "/C/Program Files/AMD/ROCm/5.7//include" -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -std=gnu++14 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip --hip-link --offload-arch=gfx1032 -o C:\Users\levin/.llamafile/ggml-rocm.dll.hwcg21 C:\Users\levin/.llamafile/ggml-cuda.cu -l "/C/Program Files/AMD/ROCm/5.7//lib/hipblas.lib" -l "/C/Program Files/AMD/ROCm/5.7//lib/rocblas.lib" -l "/C/Program Files/AMD/ROCm/5.7//lib/amdhip64.lib" -lkernel32
In file included from <built-in>:1:
In file included from C:\Program Files\AMD\ROCm\5.7\lib\clang\17.0.0\include\__clang_hip_runtime_wrapper.h:50:
C:\Program Files\AMD\ROCm\5.7\lib\clang\17.0.0\include\cuda_wrappers\cmath:27:15: fatal error: 'cmath' file not found
#include_next <cmath>
              ^~~~~~~
1 error generated when compiling for gfx1032.
/C/Program Files/AMD/ROCm/5.7//bin/clang++.exe: returned nonzero exit status
extracting /zip/ggml-rocm.dll to C:\Users\levin/.llamafile/ggml-rocm.dll
dynamically linking C:\Users\levin/.llamafile/ggml-rocm.dll
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm/CUDA devices:
  Device 0: AMD Radeon RX 6600, compute capability 10.3, VMM: no
GPU support successfully linked and loaded
…
llm_load_tensors: ggml ctx size       =    0.11 MiB
llm_load_tensors: using ROCm/CUDA for GPU acceleration
llm_load_tensors: system memory used  =   70.42 MiB
llm_load_tensors: VRAM used           = 3820.93 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
…
print_timings: prompt eval time =    4432.27 ms /    65 tokens (   68.19 ms per token,    14.67 tokens per second)
print_timings:        eval time =   13320.14 ms /   400 runs   (   33.30 ms per token,    30.03 tokens per second)
print_timings:       total time =   17752.41 ms

hlstudio avatar Mar 18 '24 07:03 hlstudio

The only consumer GPU that AMD officially supports on Linux is the AMD Radeon RX 7900 XTX. We can't support sadly what AMD doesn't support. If you dual boot into Windows though then things should work out of the box. Wish I could help more!

jart avatar Mar 22 '24 09:03 jart