llamafile
llamafile copied to clipboard
Running llamafile with graphics support crashes on PC with Radeon RX 6600
I tried this using different llamafiles, but it always goes like this:
[alex@Arch bin]$ sh mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile -ngl 9999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: /opt/rocm/bin/amdclang++ does not exist
link_cuda_dso: note: dynamically linking /home/alex/.llamafile/ggml-rocm.so
mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.0/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
error: Uncaught SIGABRT (SI_TKILL) at 0x3e80004d538 on Arch pid 316728 tid 316728
/home/alex/.local/bin/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile
File exists
Linux Cosmopolitan 3.2.4 MODE=x86_64; #1 SMP PREEMPT_DYNAMIC Fri, 23 Feb 2024 16:31:48 +0000 Arch 6.7.6-arch1-1
RAX 0000000000000000 RBX 000000000004d538 RDI 000000000004d538
RCX 000077050af3b32c RDX 0000000000000006 RSI 000000000004d538
RBP 000077050aeab740 RSP 00007ffc1d9b4d10 RIP 000077050af3b32c
R8 000000007fa81010 R9 0000000000000007 R10 0000000000000008
R11 0000000000000246 R12 00007704b6767558 R13 0000000000000006
R14 00007704b67679e0 R15 0000000000000000
TLS 0000000000746340
XMM0 7453746e65727275633a3a734f60206e XMM8 000000007fa86b00000000007fa848c0
XMM1 63656863207473756a22202626206573 XMM9 00000000000000000000000000000000
XMM2 6a2220262620657361622a203c202928 XMM10 00000000000000000000000000000000
XMM3 0000000000000000000a2e64656c6961 XMM11 00000000000000000000000000000000
XMM4 3a534f2026262065633a3a2a202d2065 XMM12 00000000000000000000000000000000
XMM5 ffffffffffffffffff00000000000000 XMM13 00000000000000000000000000000000
XMM6 00000000000000000000000000000000 XMM14 00000000000000000000000000000000
XMM7 00000000000000000000000000000000 XMM15 00000000000000000000000000000000
cosmoaddr2line /home/alex/.local/bin/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile 77050af3b32c 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0 77050aeac0e0
7ffc1d9b1b70 77050af3b32c NULL+0
<dangerous frame>
10008004-10008006 rw-pa- 3x automap 192kB w/ 64kB hole
10008008-10008011 rw-pa- 10x automap 640kB w/ 14gB hole
10040060-100b8308 r--s-- 492'201x automap 30gB w/ 96tB hole
6fd00004-6fd0000c rw-paF 9x zipos 576kB w/ 64gB hole
6fe00004-6fe00004 rw-paF 1x g_fds 64kB
# 30gB total mapped memory
/home/alex/.local/bin/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile -m mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf -c 0 -ngl 9999
Aborted (core dumped)
[alex@Arch bin]$
What's going wrong?
Need to install rocm hip sdk first. I successfully ran it in the local rx6600 environment of Win10
D:\Temp>llava-v1.5-7b-q4.llamafile.exe -ngl 999
initializing gpu module...
extracting /zip/llama.cpp/ggml.h to C:\Users\levin/.llamafile/ggml.h
extracting /zip/llamafile/compcap.cu to C:\Users\levin/.llamafile/compcap.cu
extracting /zip/llamafile/llamafile.h to C:\Users\levin/.llamafile/llamafile.h
extracting /zip/llamafile/tinyblas.h to C:\Users\levin/.llamafile/tinyblas.h
extracting /zip/llamafile/tinyblas.cu to C:\Users\levin/.llamafile/tinyblas.cu
extracting /zip/llama.cpp/ggml-impl.h to C:\Users\levin/.llamafile/ggml-impl.h
extracting /zip/llama.cpp/ggml-cuda.h to C:\Users\levin/.llamafile/ggml-cuda.h
extracting /zip/llama.cpp/ggml-alloc.h to C:\Users\levin/.llamafile/ggml-alloc.h
extracting /zip/llama.cpp/ggml-backend.h to C:\Users\levin/.llamafile/ggml-backend.h
extracting /zip/llama.cpp/ggml-backend-impl.h to C:\Users\levin/.llamafile/ggml-backend-impl.h
extracting /zip/llama.cpp/ggml-cuda.cu to C:\Users\levin/.llamafile/ggml-cuda.cu
"/C/Program Files/AMD/ROCm/5.7//bin/clang++.exe" -fuse-ld=lld -shared -nostartfiles -nostdlib -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-ignored-attributes -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_HIPBLAS -DIGNORE -DK_QUANTS_PER_ITERATION=2 -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -isystem "/C/Program Files/AMD/ROCm/5.7//include" -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -std=gnu++14 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip --hip-link --offload-arch=gfx1032 -o C:\Users\levin/.llamafile/ggml-rocm.dll.hwcg21 C:\Users\levin/.llamafile/ggml-cuda.cu -l "/C/Program Files/AMD/ROCm/5.7//lib/hipblas.lib" -l "/C/Program Files/AMD/ROCm/5.7//lib/rocblas.lib" -l "/C/Program Files/AMD/ROCm/5.7//lib/amdhip64.lib" -lkernel32
In file included from <built-in>:1:
In file included from C:\Program Files\AMD\ROCm\5.7\lib\clang\17.0.0\include\__clang_hip_runtime_wrapper.h:50:
C:\Program Files\AMD\ROCm\5.7\lib\clang\17.0.0\include\cuda_wrappers\cmath:27:15: fatal error: 'cmath' file not found
#include_next <cmath>
^~~~~~~
1 error generated when compiling for gfx1032.
/C/Program Files/AMD/ROCm/5.7//bin/clang++.exe: returned nonzero exit status
extracting /zip/ggml-rocm.dll to C:\Users\levin/.llamafile/ggml-rocm.dll
dynamically linking C:\Users\levin/.llamafile/ggml-rocm.dll
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm/CUDA devices:
Device 0: AMD Radeon RX 6600, compute capability 10.3, VMM: no
GPU support successfully linked and loaded
…
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: using ROCm/CUDA for GPU acceleration
llm_load_tensors: system memory used = 70.42 MiB
llm_load_tensors: VRAM used = 3820.93 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
…
print_timings: prompt eval time = 4432.27 ms / 65 tokens ( 68.19 ms per token, 14.67 tokens per second)
print_timings: eval time = 13320.14 ms / 400 runs ( 33.30 ms per token, 30.03 tokens per second)
print_timings: total time = 17752.41 ms
The only consumer GPU that AMD officially supports on Linux is the AMD Radeon RX 7900 XTX. We can't support sadly what AMD doesn't support. If you dual boot into Windows though then things should work out of the box. Wish I could help more!