ramalama Invalid option for --runtime=vllm

Issue Description

Running ramalama --runtime=vllm run granite3-dense works like a charm, but the same command with serve instead of run throws:

 ramalama --runtime vllm serve granite3-dense
serving on port 8080

:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.32(1)-release
   args: Using "$@" for setvars.sh arguments: --port 8080 --model /mnt/models/model.file --max_model_len 2048
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::

/usr/bin/entrypoint.sh: line 6: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [argument ...]] [redirection ...]

Steps to reproduce the issue

install ramalama through pip on F42
run the above command

Describe the results you received

breakage

Describe the results you expected

successful inference

ramalama info output

{
    "Accelerator": "intel",
    "Engine": {
        "Info": {
            "host": {
                "arch": "amd64",
                "buildahVersion": "1.39.4",
                "cgroupControllers": [
                    "cpu",
                    "memory",
                    "pids"
                ],
                "cgroupManager": "systemd",
                "cgroupVersion": "v2",
                "conmon": {
                    "package": "conmon-2.1.13-1.fc42.x86_64",
                    "path": "/usr/bin/conmon",
                    "version": "conmon version 2.1.13, commit: "
                },
                "cpuUtilization": {
                    "idlePercent": 98.66,
                    "systemPercent": 0.83,
                    "userPercent": 0.51
                },
                "cpus": 18,
                "databaseBackend": "sqlite",
                "distribution": {
                    "distribution": "fedora",
                    "variant": "server",
                    "version": "42"
                },
                "eventLogger": "journald",
                "freeLocks": 2048,
                "hostname": "chorny.thuisnet.com",
                "idMappings": {
                    "gidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000008,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 524288,
                            "size": 65536
                        }
                    ],
                    "uidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 524288,
                            "size": 65536
                        }
                    ]
                },
                "kernel": "6.14.2-300.fc42.x86_64",
                "linkmode": "dynamic",
                "logDriver": "journald",
                "memFree": 751022080,
                "memTotal": 66689716224,
                "networkBackend": "netavark",
                "networkBackendInfo": {
                    "backend": "netavark",
                    "dns": {
                        "package": "aardvark-dns-1.14.0-1.fc42.x86_64",
                        "path": "/usr/libexec/podman/aardvark-dns",
                        "version": "aardvark-dns 1.14.0"
                    },
                    "package": "netavark-1.14.1-1.fc42.x86_64",
                    "path": "/usr/libexec/podman/netavark",
                    "version": "netavark 1.14.1"
                },
                "ociRuntime": {
                    "name": "crun",
                    "package": "crun-1.21-1.fc42.x86_64",
                    "path": "/usr/bin/crun",
                    "version": "crun version 1.21\ncommit: 10269840aa07fb7e6b7e1acff6198692d8ff5c88\nrundir: /run/user/1000000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
                },
                "os": "linux",
                "pasta": {
                    "executable": "/usr/bin/pasta",
                    "package": "passt-0^20250320.g32f6212-2.fc42.x86_64",
                    "version": ""
                },
                "remoteSocket": {
                    "exists": true,
                    "path": "/run/user/1000000/podman/podman.sock"
                },
                "rootlessNetworkCmd": "pasta",
                "security": {
                    "apparmorEnabled": false,
                    "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
                    "rootless": true,
                    "seccompEnabled": true,
                    "seccompProfilePath": "/usr/share/containers/seccomp.json",
                    "selinuxEnabled": true
                },
                "serviceIsRemote": false,
                "slirp4netns": {
                    "executable": "",
                    "package": "",
                    "version": ""
                },
                "swapFree": 8587653120,
                "swapTotal": 8589930496,
                "uptime": "5h 27m 30.00s (Approximately 0.21 days)",
                "variant": ""
            },
            "plugins": {
                "authorization": null,
                "log": [
                    "k8s-file",
                    "none",
                    "passthrough",
                    "journald"
                ],
                "network": [
                    "bridge",
                    "macvlan",
                    "ipvlan"
                ],
                "volume": [
                    "local"
                ]
            },
            "registries": {
                "search": [
                    "registry.fedoraproject.org",
                    "registry.access.redhat.com",
                    "docker.io"
                ]
            },
            "store": {
                "configFile": "/home/maxim/.config/containers/storage.conf",
                "containerStore": {
                    "number": 0,
                    "paused": 0,
                    "running": 0,
                    "stopped": 0
                },
                "graphDriverName": "overlay",
                "graphOptions": {},
                "graphRoot": "/home/maxim/.local/share/containers/storage",
                "graphRootAllocated": 32094052352,
                "graphRootUsed": 15218200576,
                "graphStatus": {
                    "Backing Filesystem": "xfs",
                    "Native Overlay Diff": "true",
                    "Supports d_type": "true",
                    "Supports shifting": "false",
                    "Supports volatile": "true",
                    "Using metacopy": "false"
                },
                "imageCopyTmpDir": "/var/tmp",
                "imageStore": {
                    "number": 1
                },
                "runRoot": "/run/user/1000000/containers",
                "transientStore": false,
                "volumePath": "/home/maxim/.local/share/containers/storage/volumes"
            },
            "version": {
                "APIVersion": "5.4.2",
                "BuildOrigin": "Fedora Project",
                "Built": 1743552000,
                "BuiltTime": "Wed Apr  2 02:00:00 2025",
                "GitCommit": "be85287fcf4590961614ee37be65eeb315e5d9ff",
                "GoVersion": "go1.24.1",
                "Os": "linux",
                "OsArch": "linux/amd64",
                "Version": "5.4.2"
            }
        },
        "Name": "podman"
    },
    "Image": "quay.io/ramalama/intel-gpu:0.7",
    "Runtime": "llama.cpp",
    "Store": "/home/maxim/.local/share/ramalama",
    "UseContainer": true,
    "Version": "0.7.4"

Upstream Latest Release

Yes

Apr 16 '25 19:04 wzzrd

Please run with debug to show us the podman line. I believe the problem is we are not grabbing the vllm image when attempting to run vllm. I tried this locally and I see.

podman run --rm -i --label ai.ramalama --name ramalama_TidnUuIOaD --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=granite3-dense --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --pull newer -t --env a=b --env c=d -p 8080:8080 --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --mount=type=bind,src=/home/dwalsh/.local/share/ramalama/models/ollama/granite3-dense:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/cuda:0.7 --port 8080 --model /mnt/models/model.file --max_model_len 2048

Notice the command being handed to the image, we should be putting in vllm, I believe but on my laptop it stilled tried to run with the cuda:0.7 image which will not include vllm, and would give the same error, if I did not have other errors on my laptop.

Could you try the command with the --image pointing at the upstream vllm container image?

Apr 17 '25 12:04 rhatdan

docker.io/vllm/vllm-openai:latest

Apr 17 '25 12:04 rhatdan

Fairly sure you are correct ;)

ramalama --debug --runtime vllm serve -c 16384 --temp 0.8 --ngl 999 --threads 9 --host 0.0.0.0 --port 9999 --device /dev/accel --device /dev/dri granite3-dense:8b
exec_cmd:  podman run --rm -i --label ai.ramalama --name ramalama_2rudBfe3Eb --env=HOME=/tmp --init --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=granite3-dense:8b --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=9999 --label ai.ramalama.command=serve --pull=newer -t -p 9999:9999 --device /dev/accel --device /dev/dri --device /dev/dri --device /dev/accel -e INTEL_VISIBLE_DEVICES=1 --mount=type=bind,src=/home/maxim/.local/share/ramalama/models/ollama/granite3-dense:8b,destination=/mnt/models/model.file,ro quay.io/ramalama/intel-gpu:0.7 --port 9999 --model /mnt/models/model.file --max_model_len 2048

:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.32(1)-release
   args: Using "$@" for setvars.sh arguments: --port 9999 --model /mnt/models/model.file --max_model_len 2048
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::

/usr/bin/entrypoint.sh: line 6: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [argument ...]] [redirection ...]

Apr 17 '25 13:04 wzzrd

This got me further:

$ ramalama --image docker.io/vllm/vllm-openai:latest --debug --runtime vllm serve -c 16384 --temp 0.8 --ngl 999 --threads 9 --host 0.0.0.0 --port 9999 --device /dev/accel --device /dev/dri granite3-dense:8b exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_f7dd5ZVmQQ --env=HOME=/tmp --init --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=granite3-dense:8b --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=9999 --label ai.ramalama.command=serve --pull newer -t --env a=b --env c=d -p 9999:9999 --device /dev/accel --device /dev/dri --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --mount=type=bind,src=/home/dwalsh/.local/share/ramalama/models/ollama/granite3-dense:8b,destination=/mnt/models/model.file,ro docker.io/vllm/vllm-openai:latest --port 9999 --model /mnt/models/model.file --max_model_len 2048 Error: stat /dev/accel: no such file or directory

I don't know where /dev/accel is supposed to come from.

Apr 17 '25 16:04 rhatdan

@robertgshaw2-redhat any idea what is going on here, or someone who could check?

Apr 17 '25 16:04 rhatdan

The quadlet generator puts that in the quadlet for me (and it exists on my system with an Intel Arc GPU), and it is put int the quadlet as AddDevice=-/dev/accel so it should safe to ignore iiuic. /dev/dri and /dev/kfd are added in the same way. (Sorry if you meant something else entirely, just trying to be helpful :) )

Apr 17 '25 18:04 wzzrd

No it looks like vllm requires that this device exists in its environment at least with the image I pulled.

Apr 17 '25 20:04 rhatdan

I get this error to when executing ramalama --runtime=vllm run granite3-dense. If I put the --runtime option behind run (or serve), I get "inavlid argument: vllm". I have the same problem on both machines I have testet this with, an Intel ARC GPU and AMD RX7900XTX.

I'm using the latest version available through the install.sh script, running on Fedora Silverblue 42. The model works fine if I drop the --runtime option.

Jul 21 '25 07:07 ruben-bibsyst

I do not see this with the ramalama is main branch, I do see that --runtime=vllm is not working though.

ramalama --runtime=vllm serve granite3-dense ERROR (catatonit:50): failed to exec pid1: No such file or directory

This horrendously bad error is really telling you that vllm is not available within the default container. We should be smarter and pull a vllm based container, but this is a work in progress.

Also the error needs to be fixed in podman, I believe to have catatonit or someone show that the executable was not available.

Jul 21 '25 11:07 rhatdan

@rhatdan Hit the same issue today when running.

Followed the instruction from https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html running on Apple M4 Mac mini.

ramalama --debug --runtime=vllm serve gemma3:1b  --webui=on

It throws error.

ERROR (catatonit:2): failed to exec pid1: No such file or directory

Jul 28 '25 07:07 shamsher31

Yes to use vllm currently you need to pull a specific --image to make it work. We are working on making ramalama pick the correct vllm image. BTW vllm will not work well on a MAC, you could only use CPU inferencing, vLLM does not support MAC GPUS.

Jul 28 '25 09:07 rhatdan

@rhatdan Based on my exploration, I found that on macOS with Podman and Homebrew, the required binaries for catatonit are often missing both in the container and the Podman VM environment.

vLLM recommends building from the source if running on Apple Silicon so not sure if --image is required on Mac OS https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#apple-silicon

BTW vllm will not work well on a MAC, you could only use CPU inferencing, vLLM does not support MAC GPUS.

Understood.

They mentioned it on their docs.

On macOS the VLLM_TARGET_DEVICE is automatically set to cpu, which currently is the only supported device.

Jul 28 '25 12:07 shamsher31

A friendly reminder that this issue had no activity for 30 days.

Aug 28 '25 00:08 github-actions[bot]

I do not believe any progress has been made on this.

Sep 04 '25 11:09 rhatdan

A friendly reminder that this issue had no activity for 30 days.

Oct 05 '25 00:10 github-actions[bot]

ramalama ramalama copied to clipboard

Invalid option for --runtime=vllm

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

ramalama info output

Upstream Latest Release

ramalama
ramalama copied to clipboard