ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

ramalama not working on older x86 hardware

Open kraxel opened this issue 8 months ago • 4 comments

kraxel@ollivander ~# ramalama --debug run tiny hello
run_cmd:  podman inspect quay.io/ramalama/cuda:0.6
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd:  podman run --rm -i --label ai.ramalama --name ramalama_v2MjJibw8H --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=ollama://tinyllama --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.command=run --env LLAMA_PROMPT_PREFIX=🦭 >  --pull=newer -t --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --network none --mount=type=bind,src=/home/kraxel/.local/share/ramalama/models/ollama/tinyllama:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/cuda:latest llama-run -c 2048 --temp 0.8 -v --ngl 999 /mnt/models/model.file hello
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes
kraxel@ollivander ~# dmesg | tail -1
[  401.460183] traps: llama-run[1907] trap invalid opcode ip:7f2d23f352ac sp:7ffcf9dbfd20 error:0 in libggml-cpu.so[3a2ac,7f2d23f03000+60000]
kraxel@ollivander ~# lscpu | grep avx
kraxel@ollivander ~# 

I suspect libggml-cpu.so goes use AVX instructions without checking the CPU actually supports them.

kraxel avatar Apr 08 '25 08:04 kraxel

Could you open this issue with llama.cpp to see if it can be fixed there?

rhatdan avatar Apr 08 '25 15:04 rhatdan

Also curious whats the model of your CPU @kraxel ? I don't recall seeing anyone report this... Like even if we can turn off these instructions, it may make sense to do a "legacy" image, if this tanks performance.

But yes llama.cpp is the first place to go.

ericcurtin avatar Apr 09 '25 10:04 ericcurtin

Also curious whats the model of your CPU @kraxel ? I don't recall seeing anyone report this... Like even if we can turn off these instructions, it may make sense to do a "legacy" image, if this tanks performance.

Intel(R) Xeon(R) CPU W3550 according to lscpu. ~10 years old. Has sse up to 4.2, but no avx.

Using it for AI experiments because it happens to be the only machine at hand I can plug PCIe cards in for GPU acceleration. All my newer hardware is Intel NUCs (but not new enough to have Arc GPU hardware).

Actually running models on a CPU that old doesn't make much sense inbdeed, but apparently llama.cpp uses avx even when only loading the model into the GPU ...

kraxel avatar Apr 10 '25 07:04 kraxel

IMHO,not just basic avx needed. In case your NUC supports thunderbolt or other type-c with PCIe tunneling (some of the USB4 ports), you can consider an eGPU adapter.

afazekas avatar Apr 26 '25 10:04 afazekas

This issue appears to have been fixed via https://github.com/ggml-org/llama.cpp/pull/12871. Are we good to close this? :)

taronaeo avatar Jun 03 '25 15:06 taronaeo

Lets close, we can re-open if necessary

ericcurtin avatar Jun 03 '25 16:06 ericcurtin