Alpaca icon indicating copy to clipboard operation
Alpaca copied to clipboard

Flatpak - ollama no longer working with amdgpu

Open lobau opened this issue 9 months ago • 3 comments

I followed your instructions:

  • I installed the Ollama plugin
  • I installed the AMD plugin

Now every time I type a message and press enter, I get this error (for every model I tried)

Image

I get the same thing on all my Fedora Silverblue computers. A thinkpad laptop and a Minisforum desktop, both full AMD systems.

Here is the output of all the commands:

lobau@fedora:~$ flatpak list --columns=app,installation | grep Alpaca
com.jeffser.Alpaca	system
lobau@fedora:~$ flatpak install com.jeffser.Alpaca.Plugins.Ollama
Looking for matches…


        ID                                           Branch           Op           Remote            Download
 1. [✓] com.jeffser.Alpaca.Plugins.Ollama            stable           i            flathub           1.5 GB / 1.5 GB

Installation complete.
lobau@fedora:~$ flatpak install com.jeffser.Alpaca.Plugins.AMD
Looking for matches…


        ID                                       Branch           Op           Remote           Download
 1. [✓] com.jeffser.Alpaca.Plugins.AMD           stable           i            flathub          1.3 GB / 1.4 GB

Installation complete.
lobau@fedora:~$ flatpak run com.jeffser.Alpaca
INFO	[main.py | main] Alpaca version: 5.1.0
INFO	[instance_manager.py | start] Starting Alpaca's Ollama instance...
INFO	[instance_manager.py | start] Started Alpaca's Ollama instance
Couldn't find '/var/home/lobau/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2025/03/08 22:24:19 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-08T22:24:19.710-08:00 level=INFO source=images.go:432 msg="total blobs: 17"
INFO	[instance_manager.py | start] client version is 0.5.12
time=2025-03-08T22:24:19.710-08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-08T22:24:19.711-08:00 level=INFO source=routes.go:1256 msg="Listening on [::]:11435 (version 0.5.12)"
time=2025-03-08T22:24:19.711-08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-08T22:24:19.717-08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-03-08T22:24:19.717-08:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
time=2025-03-08T22:24:19.717-08:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
time=2025-03-08T22:24:19.717-08:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-03-08T22:24:19.717-08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="30.6 GiB" available="24.1 GiB"
[GIN] 2025/03/08 - 22:24:19 | 200 |    1.236349ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/03/08 - 22:24:19 | 200 |   20.575736ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/08 - 22:24:19 | 200 |    29.10398ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/08 - 22:24:19 | 200 |   29.488761ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/03/08 - 22:24:26 | 200 |     759.373µs |       127.0.0.1 | GET      "/api/tags"
time=2025-03-08T22:24:28.669-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.9 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:28.670-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:28.670-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:28.670-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:28.670-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 42643"
time=2025-03-08T22:24:28.671-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:28.671-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:28.671-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:28.687-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:29.438-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:29.675-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2025-03-08T22:24:29.675-08:00 level=WARN source=server.go:477 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)"
[GIN] 2025/03/08 - 22:24:29 | 500 |  1.048063923s |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
INFO	[_base_client.py | _retry_request] Retrying request to /chat/completions in 0.478650 seconds
time=2025-03-08T22:24:29.702-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.8 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:29.702-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:29.702-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:29.702-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:29.703-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 34945"
time=2025-03-08T22:24:29.704-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:29.704-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:29.704-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:29.718-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:30.398-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:30.456-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2025-03-08T22:24:30.456-08:00 level=WARN source=server.go:477 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)"
[GIN] 2025/03/08 - 22:24:30 | 500 |   1.82926364s |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
INFO	[_base_client.py | _retry_request] Retrying request to /chat/completions in 0.457348 seconds
time=2025-03-08T22:24:30.483-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.8 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:30.483-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:30.483-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:30.483-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:30.484-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 35675"
time=2025-03-08T22:24:30.484-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:30.484-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:30.485-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:30.499-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:31.188-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:31.238-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
[GIN] 2025/03/08 - 22:24:31 | 500 |  1.080589183s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2025-03-08T22:24:31.238-08:00 level=WARN source=server.go:477 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
INFO	[_base_client.py | _retry_request] Retrying request to /chat/completions in 0.903258 seconds
time=2025-03-08T22:24:31.264-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.7 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:31.264-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:31.264-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:31.264-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:31.265-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 36389"
time=2025-03-08T22:24:31.265-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:31.265-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:31.266-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:31.280-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:31.952-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:32.019-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
[GIN] 2025/03/08 - 22:24:32 | 500 |  1.102176532s |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
INFO	[_base_client.py | _retry_request] Retrying request to /chat/completions in 0.821362 seconds
time=2025-03-08T22:24:32.185-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.7 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:32.186-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:32.186-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:32.186-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:32.186-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 43523"
time=2025-03-08T22:24:32.187-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:32.187-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:32.187-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:32.201-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:32.877-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:32.940-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2025-03-08T22:24:32.940-08:00 level=WARN source=server.go:477 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)"
[GIN] 2025/03/08 - 22:24:32 | 500 |  797.018094ms |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
time=2025-03-08T22:24:32.967-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.7 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:32.967-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:32.967-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:32.967-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:32.968-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 33111"
time=2025-03-08T22:24:32.968-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:32.968-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:32.968-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:32.983-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:33.691-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:33.720-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2025-03-08T22:24:33.720-08:00 level=WARN source=server.go:477 msg="llama runner process no longer running" sys=134 string="signal: aborted (core dumped)"
[GIN] 2025/03/08 - 22:24:33 | 500 |  877.437527ms |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
ERROR	[instance_manager.py | generate_message] Error code: 500 - {'error': {'message': 'llama runner process has terminated: error:Could not initialize Tensile host: No devices found', 'type': 'api_error', 'param': None, 'code': None}}
time=2025-03-08T22:24:33.748-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.6 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:33.749-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:33.749-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:33.749-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:33.749-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 43561"
time=2025-03-08T22:24:33.749-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:33.749-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:33.750-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:33.765-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:34.482-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:34.503-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
[GIN] 2025/03/08 - 22:24:34 | 500 |  1.558305345s |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
INFO	[_base_client.py | _retry_request] Retrying request to /chat/completions in 0.377774 seconds
time=2025-03-08T22:24:34.920-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="24.1 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:34.920-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:34.920-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:34.920-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[24.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:34.921-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 41091"
time=2025-03-08T22:24:34.921-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:34.921-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:34.921-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:34.933-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:35.556-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:35.674-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
[GIN] 2025/03/08 - 22:24:35 | 500 |  790.568219ms |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
INFO	[_base_client.py | _retry_request] Retrying request to /chat/completions in 0.997808 seconds
time=2025-03-08T22:24:36.751-08:00 level=INFO source=server.go:97 msg="system memory" total="30.6 GiB" free="23.9 GiB" free_swap="8.0 GiB"
time=2025-03-08T22:24:36.752-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-08T22:24:36.752-08:00 level=WARN source=ggml.go:132 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-08T22:24:36.752-08:00 level=INFO source=server.go:130 msg=offload library=cpu layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[23.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2025-03-08T22:24:36.753-08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/app/plugins/Ollama/bin/ollama runner --model /var/home/lobau/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 4 --port 39009"
time=2025-03-08T22:24:36.753-08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-08T22:24:36.753-08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-08T22:24:36.754-08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-08T22:24:36.780-08:00 level=INFO source=runner.go:932 msg="starting go runner"

rocBLAS error: Could not initialize Tensile host: No devices found
time=2025-03-08T22:24:37.730-08:00 level=ERROR source=server.go:421 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2025-03-08T22:24:37.758-08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
[GIN] 2025/03/08 - 22:24:37 | 500 |   1.08275602s |       127.0.0.1 | POST     "/v1/chat/completions"
INFO	[_client.py | _send_single_request] HTTP Request: POST http://0.0.0.0:11435/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
ERROR	[instance_manager.py | generate_chat_title] Error code: 500 - {'error': {'message': 'llama runner process has terminated: error:Could not initialize Tensile host: No devices found', 'type': 'api_error', 'param': None, 'code': None}}

Note: I really liked the previous way of using the software. Download from Flathub, install some models, and you're done. Having to enter terminal commands is a bit of a UX regression in my opinion :(

lobau avatar Mar 09 '25 06:03 lobau

That problem has to do with AMD GPUs and how they work on Ollama, I'm still trying to figure it out.

About the last paragraph, you don't have to use the terminal to install extensions, they are available in Gnome Software (the app store), sometimes they don't show up and that's why I give the option to use a terminal.

I have no control of when they do or don't show up on Gnome Software, as far as I'm aware that's a bug in their part.

Jeffser avatar Mar 09 '25 21:03 Jeffser

I am having the same issue on Arch Linux.

Image

Using an AMD GPU, Flatpak, and ollama plugin as well.

Traceback (most recent call last): File "/app/share/Alpaca/alpaca/window.py", line 1137, in <lambda> enter_key_controller.connect("key-pressed", lambda controller, keyval, keycode, state: (self.send_message(None, bool(state & Gdk.ModifierType.CONTROL_MASK)) or True) if keyval==Gdk.KEY_Return and not (state & Gdk.ModifierType.SHIFT_MASK) else None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/share/Alpaca/alpaca/window.py", line 401, in send_message threading.Thread(target=self.get_current_instance().generate_message, args=(m_element_bot, current_model)).start() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'empty' object has no attribute 'generate_message'

Note: I really liked the previous way of using the software. Download from Flathub, install some models, and you're done. Having to enter terminal commands is a bit of a UX regression in my opinion :(

I got very surprised too, the program was very easy to use and the changes caught me off guard.

AirisLuna avatar Mar 10 '25 07:03 AirisLuna

@lobau Maybe you could rename the issue to something like Flatpak - ollama no longer working with amdgpu, because this affects all flatpaks, and that way people (like me) don't think this is about something special in fedora and overlook it.

shadowfly256 avatar Mar 14 '25 01:03 shadowfly256