Alpaca icon indicating copy to clipboard operation
Alpaca copied to clipboard

AMD GPU detected but extension is missing error, even though extension is installed

Open Drakulas01 opened this issue 9 months ago • 5 comments

Describe the bug Even though i have AMD support extension installed, after latest update Alpaca seems to be unable to locate it

Expected behavior AMD GPU should be used when extension is installed

Screenshots Image

Debugging information

INFO    [main.py | main] Alpaca version: 5.2.0
INFO    [instance_manager.py | start] Starting Alpaca's Ollama instance...
INFO    [instance_manager.py | start] Started Alpaca's Ollama instance
Couldn't find '/home/stas/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHh9Aj558S/CQPu9r0vl6rCvyAJl9ta/rNuPVQnpyD+x

INFO    [instance_manager.py | start] client version is 0.6.0
2025/03/13 21:06:03 routes.go:1225: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-13T21:06:03.459+02:00 level=INFO source=images.go:432 msg="total blobs: 23"
time=2025-03-13T21:06:03.460+02:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-13T21:06:03.460+02:00 level=INFO source=routes.go:1292 msg="Listening on [::]:11435 (version 0.6.0)"
time=2025-03-13T21:06:03.460+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-13T21:06:03.463+02:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-03-13T21:06:03.463+02:00 level=WARN source=amd_linux.go:443 msg="amdgpu detected, but no compatible rocm library found.  Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2025-03-13T21:06:03.463+02:00 level=WARN source=amd_linux.go:348 msg="unable to verify rocm library: no suitable rocm found, falling back to CPU"
time=2025-03-13T21:06:03.463+02:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-03-13T21:06:03.463+02:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="15.5 GiB" available="11.2 GiB"
[GIN] 2025/03/13 - 21:06:03 | 200 |     447.997µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/03/13 - 21:06:03 | 200 |    5.354798ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/13 - 21:06:03 | 200 |   13.503886ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/13 - 21:06:03 | 200 |    15.71287ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/13 - 21:06:03 | 200 |    15.95594ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/13 - 21:06:03 | 200 |   15.572135ms |       127.0.0.1 | POST     "/api/show"

Drakulas01 avatar Mar 13 '25 19:03 Drakulas01

Working on it

Jeffser avatar Mar 13 '25 22:03 Jeffser

A fixed is being published onto Flathub but AMD has been problematic recently

Jeffser avatar Mar 13 '25 22:03 Jeffser

Yeah, i saw the mess that AMD support was for last couple of weeks 😄. Thanks for your work, i would probably not be able to run LLMs locally on my own.

With update the error is gone, but for some reason Alpaca (or Ollama?) seems to run on CPU either way.

INFO    [main.py | main] Alpaca version: 5.2.0
INFO    [instance_manager.py | start] Starting Alpaca's Ollama instance...
INFO    [instance_manager.py | start] Started Alpaca's Ollama instance
Couldn't find '/home/stas/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJm6w+2lH3G3LMk3IHSRSoAsU2/MhnPy6npL9gvVcMy7

2025/03/14 18:48:55 routes.go:1225: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
INFO    [instance_manager.py | start] client version is 0.6.0
time=2025-03-14T18:48:55.372+02:00 level=INFO source=images.go:432 msg="total blobs: 23"
time=2025-03-14T18:48:55.373+02:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-14T18:48:55.377+02:00 level=INFO source=routes.go:1292 msg="Listening on [::]:11435 (version 0.6.0)"
time=2025-03-14T18:48:55.377+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-14T18:48:55.384+02:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-03-14T18:48:55.385+02:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
time=2025-03-14T18:48:55.385+02:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1031 driver=0.0 name=1002:73df total="12.0 GiB" available="10.6 GiB"
[GIN] 2025/03/14 - 18:48:55 | 200 |     548.897µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/03/14 - 18:48:55 | 200 |   12.718681ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/14 - 18:48:55 | 200 |   19.013273ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/14 - 18:48:55 | 200 |   34.205005ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/14 - 18:48:55 | 200 |    36.94862ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/14 - 18:48:55 | 200 |    42.34719ms |       127.0.0.1 | POST     "/api/show"

----- A Lot of gnome complaining about my cursor skipped -----

time=2025-03-14T18:49:07.963+02:00 level=WARN source=ggml.go:149 msg="key not found" key=llama.attention.key_length default=128
time=2025-03-14T18:49:07.963+02:00 level=WARN source=ggml.go:149 msg="key not found" key=llama.attention.value_length default=128
time=2025-03-14T18:49:07.963+02:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 gpu=0 parallel=4 available=11355013120 required="5.9 GiB"
time=2025-03-14T18:49:07.963+02:00 level=INFO source=server.go:105 msg="system memory" total="15.5 GiB" free="12.1 GiB" free_swap="6.7 GiB"
time=2025-03-14T18:49:07.964+02:00 level=WARN source=ggml.go:149 msg="key not found" key=llama.attention.key_length default=128
time=2025-03-14T18:49:07.964+02:00 level=WARN source=ggml.go:149 msg="key not found" key=llama.attention.value_length default=128
time=2025-03-14T18:49:07.964+02:00 level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[10.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.9 GiB" memory.required.partial="5.9 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[5.9 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.6 GiB" memory.weights.nonrepeating="105.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="585.0 MiB"
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Mistral-7B-Instruct-v0.3
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 32768
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32768]   = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32768]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32768]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 3.83 GiB (4.54 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 771
load: token to piece cache size = 0.1731 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 7.25 B
print_info: general.name     = Mistral-7B-Instruct-v0.3
print_info: vocab type       = SPM
print_info: n_vocab          = 32768
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: LF token         = 781 '<0x0A>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
llama_model_load: vocab only - skipping tensors
time=2025-03-14T18:49:07.994+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 --ctx-size 8192 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 4 --port 46837"
time=2025-03-14T18:49:07.995+02:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-14T18:49:07.995+02:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-14T18:49:07.995+02:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-14T18:49:08.003+02:00 level=INFO source=runner.go:931 msg="starting go runner"
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /app/plugins/lib/ollama/rocm/libggml-hip.so
load_backend: loaded CPU backend from /app/plugins/Ollama/lib/ollama/libggml-cpu-haswell.so
time=2025-03-14T18:49:08.095+02:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-03-14T18:49:08.096+02:00 level=INFO source=runner.go:991 msg="Server listening on 127.0.0.1:46837"
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 (version GGUF V3 (latest))

Seems like there is a lot of repeating info in logs from this point, so tell me if you'll need it. Looks like for some reason it fails to find GPU again, even though there is no error in Alpaca, so it defaults back to CPU

time=2025-03-14T18:49:08.003+02:00 level=INFO source=runner.go:931 msg="starting go runner"
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /app/plugins/lib/ollama/rocm/libggml-hip.so
load_backend: loaded CPU backend from /app/plugins/Ollama/lib/ollama/libggml-cpu-haswell.so

Drakulas01 avatar Mar 14 '25 17:03 Drakulas01

So, it seems like there is an issue with running ROCm on kernels newer than 6.11 at the moment. I have same issue as in linked one when trying to run ComfyUI, so this might be a reason for Alpaca failing to get GPU too (although i haven't seen same hipblasLT issue in logs). Would probably be nice if somebody checked alpaca on 6.11 kernel with AMD gpu to confirm that

Drakulas01 avatar Mar 29 '25 11:03 Drakulas01

With 6.0.1 the error message is back AMD plugin is installed, after update i reinstalled it just to be sure

Alpaca                  com.jeffser.Alpaca                   6.0.1   stable  flathub user
Alpaca AMD Support      com.jeffser.Alpaca.Plugins.AMD       0.6.2   stable  flathub user
Ollama Instance         com.jeffser.Alpaca.Plugins.Ollama    0.6.4   stable  flathub user
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEE/8tCqVJBEC42vdZpYMWAi5vxlea4iJeuttP/McB7P

2025/04/12 23:21:51 routes.go:1231: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/stas/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-04-12T23:21:51.509+03:00 level=INFO source=images.go:458 msg="total blobs: 23"
time=2025-04-12T23:21:51.509+03:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-12T23:21:51.510+03:00 level=INFO source=routes.go:1298 msg="Listening on [::]:11435 (version 0.6.4)"
time=2025-04-12T23:21:51.510+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-12T23:21:51.517+03:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-04-12T23:21:51.517+03:00 level=WARN source=amd_linux.go:443 msg="amdgpu detected, but no compatible rocm library found.  Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2025-04-12T23:21:51.517+03:00 level=WARN source=amd_linux.go:348 msg="unable to verify rocm library: no suitable rocm found, falling back to CPU"
time=2025-04-12T23:21:51.517+03:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-04-12T23:21:51.517+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="15.5 GiB" available="11.2 GiB"
[GIN] 2025/04/12 - 23:21:51 | 200 |       708.5µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/04/12 - 23:21:51 | 200 |    14.02481ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/12 - 23:21:51 | 200 |   48.636422ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/12 - 23:21:51 | 200 |   53.639489ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/12 - 23:21:51 | 200 |    69.89761ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/12 - 23:21:51 | 200 |   76.231552ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/12 - 23:22:15 | 200 |     528.639µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/04/12 - 23:22:54 | 200 |     637.514µs |       127.0.0.1 | GET      "/api/tags"

Drakulas01 avatar Apr 12 '25 20:04 Drakulas01