ramalama not working on older x86 hardware
kraxel@ollivander ~# ramalama --debug run tiny hello
run_cmd: podman inspect quay.io/ramalama/cuda:0.6
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_v2MjJibw8H --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=ollama://tinyllama --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.command=run --env LLAMA_PROMPT_PREFIX=🦭 > --pull=newer -t --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --network none --mount=type=bind,src=/home/kraxel/.local/share/ramalama/models/ollama/tinyllama:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/cuda:latest llama-run -c 2048 --temp 0.8 -v --ngl 999 /mnt/models/model.file hello
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes
kraxel@ollivander ~# dmesg | tail -1
[ 401.460183] traps: llama-run[1907] trap invalid opcode ip:7f2d23f352ac sp:7ffcf9dbfd20 error:0 in libggml-cpu.so[3a2ac,7f2d23f03000+60000]
kraxel@ollivander ~# lscpu | grep avx
kraxel@ollivander ~#
I suspect libggml-cpu.so goes use AVX instructions without checking the CPU actually supports them.
Could you open this issue with llama.cpp to see if it can be fixed there?
Also curious whats the model of your CPU @kraxel ? I don't recall seeing anyone report this... Like even if we can turn off these instructions, it may make sense to do a "legacy" image, if this tanks performance.
But yes llama.cpp is the first place to go.
Also curious whats the model of your CPU @kraxel ? I don't recall seeing anyone report this... Like even if we can turn off these instructions, it may make sense to do a "legacy" image, if this tanks performance.
Intel(R) Xeon(R) CPU W3550 according to lscpu. ~10 years old. Has sse up to 4.2, but no avx.
Using it for AI experiments because it happens to be the only machine at hand I can plug PCIe cards in for GPU acceleration. All my newer hardware is Intel NUCs (but not new enough to have Arc GPU hardware).
Actually running models on a CPU that old doesn't make much sense inbdeed, but apparently llama.cpp uses avx even when only loading the model into the GPU ...
IMHO,not just basic avx needed. In case your NUC supports thunderbolt or other type-c with PCIe tunneling (some of the USB4 ports), you can consider an eGPU adapter.
This issue appears to have been fixed via https://github.com/ggml-org/llama.cpp/pull/12871. Are we good to close this? :)
Lets close, we can re-open if necessary