ramalama
ramalama copied to clipboard
Invalid option for --runtime=vllm
Issue Description
Running ramalama --runtime=vllm run granite3-dense works like a charm, but the same command with serve instead of run throws:
ramalama --runtime vllm serve granite3-dense
serving on port 8080
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.2.32(1)-release
args: Using "$@" for setvars.sh arguments: --port 8080 --model /mnt/models/model.file --max_model_len 2048
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
/usr/bin/entrypoint.sh: line 6: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [argument ...]] [redirection ...]
Steps to reproduce the issue
Steps to reproduce the issue
- install ramalama through pip on F42
- run the above command
Describe the results you received
breakage
Describe the results you expected
successful inference
ramalama info output
{
"Accelerator": "intel",
"Engine": {
"Info": {
"host": {
"arch": "amd64",
"buildahVersion": "1.39.4",
"cgroupControllers": [
"cpu",
"memory",
"pids"
],
"cgroupManager": "systemd",
"cgroupVersion": "v2",
"conmon": {
"package": "conmon-2.1.13-1.fc42.x86_64",
"path": "/usr/bin/conmon",
"version": "conmon version 2.1.13, commit: "
},
"cpuUtilization": {
"idlePercent": 98.66,
"systemPercent": 0.83,
"userPercent": 0.51
},
"cpus": 18,
"databaseBackend": "sqlite",
"distribution": {
"distribution": "fedora",
"variant": "server",
"version": "42"
},
"eventLogger": "journald",
"freeLocks": 2048,
"hostname": "chorny.thuisnet.com",
"idMappings": {
"gidmap": [
{
"container_id": 0,
"host_id": 1000008,
"size": 1
},
{
"container_id": 1,
"host_id": 524288,
"size": 65536
}
],
"uidmap": [
{
"container_id": 0,
"host_id": 1000000,
"size": 1
},
{
"container_id": 1,
"host_id": 524288,
"size": 65536
}
]
},
"kernel": "6.14.2-300.fc42.x86_64",
"linkmode": "dynamic",
"logDriver": "journald",
"memFree": 751022080,
"memTotal": 66689716224,
"networkBackend": "netavark",
"networkBackendInfo": {
"backend": "netavark",
"dns": {
"package": "aardvark-dns-1.14.0-1.fc42.x86_64",
"path": "/usr/libexec/podman/aardvark-dns",
"version": "aardvark-dns 1.14.0"
},
"package": "netavark-1.14.1-1.fc42.x86_64",
"path": "/usr/libexec/podman/netavark",
"version": "netavark 1.14.1"
},
"ociRuntime": {
"name": "crun",
"package": "crun-1.21-1.fc42.x86_64",
"path": "/usr/bin/crun",
"version": "crun version 1.21\ncommit: 10269840aa07fb7e6b7e1acff6198692d8ff5c88\nrundir: /run/user/1000000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
},
"os": "linux",
"pasta": {
"executable": "/usr/bin/pasta",
"package": "passt-0^20250320.g32f6212-2.fc42.x86_64",
"version": ""
},
"remoteSocket": {
"exists": true,
"path": "/run/user/1000000/podman/podman.sock"
},
"rootlessNetworkCmd": "pasta",
"security": {
"apparmorEnabled": false,
"capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
"rootless": true,
"seccompEnabled": true,
"seccompProfilePath": "/usr/share/containers/seccomp.json",
"selinuxEnabled": true
},
"serviceIsRemote": false,
"slirp4netns": {
"executable": "",
"package": "",
"version": ""
},
"swapFree": 8587653120,
"swapTotal": 8589930496,
"uptime": "5h 27m 30.00s (Approximately 0.21 days)",
"variant": ""
},
"plugins": {
"authorization": null,
"log": [
"k8s-file",
"none",
"passthrough",
"journald"
],
"network": [
"bridge",
"macvlan",
"ipvlan"
],
"volume": [
"local"
]
},
"registries": {
"search": [
"registry.fedoraproject.org",
"registry.access.redhat.com",
"docker.io"
]
},
"store": {
"configFile": "/home/maxim/.config/containers/storage.conf",
"containerStore": {
"number": 0,
"paused": 0,
"running": 0,
"stopped": 0
},
"graphDriverName": "overlay",
"graphOptions": {},
"graphRoot": "/home/maxim/.local/share/containers/storage",
"graphRootAllocated": 32094052352,
"graphRootUsed": 15218200576,
"graphStatus": {
"Backing Filesystem": "xfs",
"Native Overlay Diff": "true",
"Supports d_type": "true",
"Supports shifting": "false",
"Supports volatile": "true",
"Using metacopy": "false"
},
"imageCopyTmpDir": "/var/tmp",
"imageStore": {
"number": 1
},
"runRoot": "/run/user/1000000/containers",
"transientStore": false,
"volumePath": "/home/maxim/.local/share/containers/storage/volumes"
},
"version": {
"APIVersion": "5.4.2",
"BuildOrigin": "Fedora Project",
"Built": 1743552000,
"BuiltTime": "Wed Apr 2 02:00:00 2025",
"GitCommit": "be85287fcf4590961614ee37be65eeb315e5d9ff",
"GoVersion": "go1.24.1",
"Os": "linux",
"OsArch": "linux/amd64",
"Version": "5.4.2"
}
},
"Name": "podman"
},
"Image": "quay.io/ramalama/intel-gpu:0.7",
"Runtime": "llama.cpp",
"Store": "/home/maxim/.local/share/ramalama",
"UseContainer": true,
"Version": "0.7.4"
Upstream Latest Release
Yes
Please run with debug to show us the podman line. I believe the problem is we are not grabbing the vllm image when attempting to run vllm. I tried this locally and I see.
podman run --rm -i --label ai.ramalama --name ramalama_TidnUuIOaD --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=granite3-dense --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --pull newer -t --env a=b --env c=d -p 8080:8080 --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --mount=type=bind,src=/home/dwalsh/.local/share/ramalama/models/ollama/granite3-dense:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/cuda:0.7 --port 8080 --model /mnt/models/model.file --max_model_len 2048
Notice the command being handed to the image, we should be putting in vllm, I believe but on my laptop it stilled tried to run with the cuda:0.7 image which will not include vllm, and would give the same error, if I did not have other errors on my laptop.
Could you try the command with the --image pointing at the upstream vllm container image?
docker.io/vllm/vllm-openai:latest
Fairly sure you are correct ;)
ramalama --debug --runtime vllm serve -c 16384 --temp 0.8 --ngl 999 --threads 9 --host 0.0.0.0 --port 9999 --device /dev/accel --device /dev/dri granite3-dense:8b
exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_2rudBfe3Eb --env=HOME=/tmp --init --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=granite3-dense:8b --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=9999 --label ai.ramalama.command=serve --pull=newer -t -p 9999:9999 --device /dev/accel --device /dev/dri --device /dev/dri --device /dev/accel -e INTEL_VISIBLE_DEVICES=1 --mount=type=bind,src=/home/maxim/.local/share/ramalama/models/ollama/granite3-dense:8b,destination=/mnt/models/model.file,ro quay.io/ramalama/intel-gpu:0.7 --port 9999 --model /mnt/models/model.file --max_model_len 2048
:: initializing oneAPI environment ...
entrypoint.sh: BASH_VERSION = 5.2.32(1)-release
args: Using "$@" for setvars.sh arguments: --port 9999 --model /mnt/models/model.file --max_model_len 2048
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
/usr/bin/entrypoint.sh: line 6: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [argument ...]] [redirection ...]
This got me further:
$ ramalama --image docker.io/vllm/vllm-openai:latest --debug --runtime vllm serve -c 16384 --temp 0.8 --ngl 999 --threads 9 --host 0.0.0.0 --port 9999 --device /dev/accel --device /dev/dri granite3-dense:8b exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_f7dd5ZVmQQ --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=granite3-dense:8b --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=9999 --label ai.ramalama.command=serve --pull newer -t --env a=b --env c=d -p 9999:9999 --device /dev/accel --device /dev/dri --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --mount=type=bind,src=/home/dwalsh/.local/share/ramalama/models/ollama/granite3-dense:8b,destination=/mnt/models/model.file,ro docker.io/vllm/vllm-openai:latest --port 9999 --model /mnt/models/model.file --max_model_len 2048 Error: stat /dev/accel: no such file or directory
I don't know where /dev/accel is supposed to come from.
@robertgshaw2-redhat any idea what is going on here, or someone who could check?
The quadlet generator puts that in the quadlet for me (and it exists on my system with an Intel Arc GPU), and it is put int the quadlet as AddDevice=-/dev/accel so it should safe to ignore iiuic. /dev/dri and /dev/kfd are added in the same way. (Sorry if you meant something else entirely, just trying to be helpful :) )
No it looks like vllm requires that this device exists in its environment at least with the image I pulled.
I get this error to when executing ramalama --runtime=vllm run granite3-dense. If I put the --runtime option behind run (or serve), I get "inavlid argument: vllm". I have the same problem on both machines I have testet this with, an Intel ARC GPU and AMD RX7900XTX.
I'm using the latest version available through the install.sh script, running on Fedora Silverblue 42. The model works fine if I drop the --runtime option.
I do not see this with the ramalama is main branch, I do see that --runtime=vllm is not working though.
ramalama --runtime=vllm serve granite3-dense ERROR (catatonit:50): failed to exec pid1: No such file or directory
This horrendously bad error is really telling you that vllm is not available within the default container. We should be smarter and pull a vllm based container, but this is a work in progress.
Also the error needs to be fixed in podman, I believe to have catatonit or someone show that the executable was not available.
@rhatdan Hit the same issue today when running.
Followed the instruction from https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html running on Apple M4 Mac mini.
ramalama --debug --runtime=vllm serve gemma3:1b --webui=on
It throws error.
ERROR (catatonit:2): failed to exec pid1: No such file or directory
Yes to use vllm currently you need to pull a specific --image to make it work. We are working on making ramalama pick the correct vllm image. BTW vllm will not work well on a MAC, you could only use CPU inferencing, vLLM does not support MAC GPUS.
@rhatdan Based on my exploration, I found that on macOS with Podman and Homebrew, the required binaries for catatonit are often missing both in the container and the Podman VM environment.
vLLM recommends building from the source if running on Apple Silicon so not sure if --image is required on Mac OS https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#apple-silicon
BTW vllm will not work well on a MAC, you could only use CPU inferencing, vLLM does not support MAC GPUS.
Understood.
They mentioned it on their docs.
On macOS the VLLM_TARGET_DEVICE is automatically set to cpu, which currently is the only supported device.
A friendly reminder that this issue had no activity for 30 days.
I do not believe any progress has been made on this.
A friendly reminder that this issue had no activity for 30 days.