RamaLama won't recognize RX5700XT
The whole story starts in @RealVishy comment in #2503. That comment said, that RX5700XT is working good with RamaLama on Linux.
I'm using Bluefin-dx and tried to run RamaLama with no avail. I created an issue here ublue-os/bluefin#2197. And got an suggestion to post an issue here.
I'm using RamaLama bundled with distro:
❯ /usr/bin/ramalama -v
ramalama version 0.5.2
Testing:
/usr/bin/ramalama --debug run llama3.2
run_cmd: podman inspect quay.io/ramalama/rocm:0.5
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd: podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_Eef0KsY5uh --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --mount=type=bind,src=/var/home/vlad/.local/share/ramalama/models/ollama/llama3.2:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/rocm:latest llama-run -c 2048 --temp 0.8 -v /mnt/models/model.file
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32
~ took 5s
Any help appreciated.
Maybe you need this fix?
https://github.com/containers/ramalama/pull/802
Maybe you need this fix?
Thanks for your suggestion. I tried:
❯ /usr/bin/ramalama --debug --ngl=999 run llama3.2
usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE] [--gpu] [--image IMAGE] [--nocontainer] [--runtime {llama.cpp,vllm}] [--store STORE] [-v]
{help,containers,ps,convert,info,list,ls,login,logout,pull,push,rm,run,serve,stop,version} ...
ramalama: error: unrecognized arguments: --ngl=999
~
It's --ngl 999 rather than --ngl=999
It's
--ngl 999rather than--ngl=999
Yeah, I tried that too
❯ /usr/bin/ramalama --debug --ngl 999 run llama3.2
usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE] [--gpu] [--image IMAGE] [--nocontainer] [--runtime {llama.cpp,vllm}] [--store STORE] [-v]
{help,containers,ps,convert,info,list,ls,login,logout,pull,push,rm,run,serve,stop,version} ...
ramalama: error: argument subcommand: invalid choice: '999' (choose from help, containers, ps, convert, info, list, ls, login, logout, pull, push, rm, run, serve, stop, version)
~
You need to put it after the run command I think
You need to put it after the run command I think
Nope.
❯ /usr/bin/ramalama --debug run --ngl 999 llama3.2
usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE] [--gpu] [--image IMAGE] [--nocontainer] [--runtime {llama.cpp,vllm}] [--store STORE] [-v]
{help,containers,ps,convert,info,list,ls,login,logout,pull,push,rm,run,serve,stop,version} ...
ramalama: error: unrecognized arguments: --ngl
~
Can you try updating the version of ramalama, this ngl thing was added recently enough
@ericcurtin Well after my distro updates arrived (Bluefin-dx), I rerun ramalam with no success.
❯ ramalama -v
ramalama version 0.5.5
❯ /usr/bin/ramalama --debug run llama3.2
run_cmd: podman inspect quay.io/ramalama/rocm:0.5
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd: podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_wHwsqJYifh --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --mount=type=bind,src=/var/home/vlad/.local/share/ramalama/models/ollama/llama3.2:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/rocm:latest llama-run -c 2048 --temp 0.8 -v --ngl 999 /mnt/models/model.file
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32
~ took 5s
Any further ideas?
Could you paste the full "--debug" output?
Also what are you using to check if the GPU is being utilised? nvtop?
5 seconds reasonable to initialized a GPU.
Might be worth trying @maxamillion 's Fedora-based container images or Vulkan also.
@ericcurtin This is already a "full" debug output.
I presume a kind of input commandline should appear. Also I tested benchwith similar results:
❯ ramalama --debug bench llama3.2
run_cmd: podman inspect quay.io/ramalama/rocm:0.5
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd: podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_hsYIHYxm4m --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --mount=type=bind,src=/var/home/vlad/.local/share/ramalama/models/ollama/llama3.2:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/rocm:latest llama-bench -ngl 999 -m /mnt/models/model.file
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32
~ took 4s
Might be worth trying @maxamillion 's Fedora-based container images or Vulkan also.
How I can do that?
@Split7fire seems like llama-run/llama-bench is crashing then, you'll need to debug this in the llama.cpp layer
@ericcurtin it's getting more and more obscure... @RealVishy stated working ramalama with this kind of hardware on similar atomic desktop, but I can not reproduce this. Also, are rx5700xt are supported by ROCm? As far as I know latest ROCm has no support for gfx1010. Are Ramalama using it own ROCm layer above official one?
@Split7fire still having this issue?
@rhatdan Certainly, yes. Just to be sure, I retest ramalama --debug bench llama3.2 and got this:
run_cmd: podman inspect quay.io/ramalama/rocm:0.6
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_OUacPYIFEh --env=HOME=/tmp --init --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=llama3.2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.command=bench --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --network none --mount=type=bind,src=/var/home/vlad/.local/share/ramalama/models/ollama/llama3.2:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/rocm:latest llama-bench -ngl 999 -m /mnt/models/model.file
Trying to pull quay.io/ramalama/rocm:latest...
Getting image source signatures
Copying blob 0159dca2e5b7 done |
Copying blob 23bf9faaf948 done |
Copying blob 23f6dbb37a63 done |
Copying config 04bfb0587d done |
Writing manifest to image destination
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32
and nothing else.
Could you update the version of RamaLama you are using, seems like ramalama 0.6.* We are getting ready to release 0.7.3?
@rhatdan I'm using ramalama from my distro (Bluefin-dx). So I tied to Bluefins release cycle. I tried to install via pip but it failed.
Ok it looks like you are using the latest image, with an older ramalama, Not sure this makes a difference.
@maxamillion PTAL
Some update: Since Feb I managed to reinstall Aurora on my device but nothing changed. I'm open to any suggestions. My current software stack:
KDE Plasma Version: 6.3.5
KDE Frameworks Version: 6.14.0
Qt Version: 6.9.0
Kernel Version: 6.14.5-300.fc42.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 12 × Intel® Xeon® CPU E5-1650 0 @ 3.20GHz
Memory: 67.3 GB of RAM
Graphics Processor: AMD Radeon RX 5700 XT
Manufacturer: HUANANZHI
I recall Aurora/Bluefin comes with Linuxbrew installed. You can grab latest ramalama using: brew install ramalama in that case.
Can I close this issue?
@rhatdan, actually ramalama still won't recognize RX5700XT. From time to time I test ramalama with no luck
Was this ever debugged with llama.cpp?
I really appreciate any hints in debugging. P.S. I also tried llama.cpp alone with no luck either. It just exit and that's all.
Please open an issue there, they are more likely to know what is going on.
A friendly reminder that this issue had no activity for 30 days.
Do not know the state of llama.cpp, but this is not something RamaLama can fix itself, so closing.
As a remark to issue: LLama.cpp works great with RX5700XT if downloaded from release page of llama.cpp. So it may be RamaLama issue. If someone has a guide how to debug this kind of issue, please, share.
How did you build it, perhaps we need to update the llama.cpp version we are using.