ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

Running in runtime VLLM throws catatoinit error

Open atlasfoo opened this issue 3 months ago • 3 comments

Issue Description

Hello

Recently updated to v0.12.2, tried to run any model from any provider using VLLM runtime throws error ' ERROR (catatonit:50): failed to exec pid1: No such file or directory'.

Running with default runtime llama.cpp works well.

Tried with models from ollama and huggingface (gguf) repos.

Steps to reproduce the issue

Run command (using ramalama cuda image:latest)

ramalama --runtime vllm serve --name qwen-coder hf://Qwen/Qwen2.5-Coder-7B-Instruct-GGUF

Describe the results you received

Gives output

2025-09-18 22:24:54 - DEBUG - run_cmd: nvidia-smi
2025-09-18 22:24:54 - DEBUG - Working directory: None
2025-09-18 22:24:54 - DEBUG - Ignore stderr: False
2025-09-18 22:24:54 - DEBUG - Ignore all: False
2025-09-18 22:24:54 - DEBUG - Command finished with return code: 0
2025-09-18 22:24:54 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.12
2025-09-18 22:24:54 - DEBUG - Working directory: None
2025-09-18 22:24:54 - DEBUG - Ignore stderr: False
2025-09-18 22:24:54 - DEBUG - Ignore all: True
2025-09-18 22:24:54 - DEBUG - Checking if 8080 is available
2025-09-18 22:24:54 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://Qwen/Qwen2.5-Coder-7B-Instruct-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --device /dev/dri --device /dev/kfd --device /dev/accel --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --runtime /usr/bin/nvidia-container-runtime -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --label ai.ramalama --name qwen-coder --env=HOME=/tmp --init --mount=type=bind,src=/var/home/atlasfoo/.local/share/ramalama/store/huggingface/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/blobs/sha256-509287f78cb4d4cf6b3843734733b914b2c158e43e22a7f4bf5e963800894d3c,destination=/mnt/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf,ro quay.io/ramalama/cuda:latest --model /mnt/models/qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080 --max-sequence-length 0 --max_model_len 2048 --served-model-name Qwen2.5-Coder-7B-Instruct-GGUF
ERROR (catatonit:50): failed to exec pid1: No such file or directory

Describe the results you expected

VLLM serve to work, like it works with llama.cpp

ramalama info output

❯ ramalama info
{
    "Accelerator": "cuda",
    "Config": {},
    "Engine": {
        "Info": {
            "host": {
                "arch": "amd64",
                "buildahVersion": "1.41.4",
                "cgroupControllers": [
                    "cpu",
                    "io",
                    "memory",
                    "pids"
                ],
                "cgroupManager": "systemd",
                "cgroupVersion": "v2",
                "conmon": {
                    "package": "conmon-2.1.13-1.fc42.x86_64",
                    "path": "/usr/bin/conmon",
                    "version": "conmon version 2.1.13, commit: "
                },
                "cpuUtilization": {
                    "idlePercent": 98.19,
                    "systemPercent": 0.66,
                    "userPercent": 1.15
                },
                "cpus": 24,
                "databaseBackend": "sqlite",
                "distribution": {
                    "codename": "Deinonychus",
                    "distribution": "bluefin",
                    "variant": "bluefin-dx-nvidia-open",
                    "version": "42"
                },
                "emulatedArchitectures": [
                    "linux/arm",
                    "linux/arm64",
                    "linux/arm64be",
                    "linux/loong64",
                    "linux/mips",
                    "linux/mips64",
                    "linux/ppc",
                    "linux/ppc64",
                    "linux/ppc64le",
                    "linux/riscv32",
                    "linux/riscv64",
                    "linux/s390x"
                ],
                "eventLogger": "journald",
                "freeLocks": 2040,
                "hostname": "atlasfoo-legion-bfdx",
                "idMappings": {
                    "gidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 524288,
                            "size": 65536
                        }
                    ],
                    "uidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 524288,
                            "size": 65536
                        }
                    ]
                },
                "kernel": "6.15.9-201.fc42.x86_64",
                "linkmode": "dynamic",
                "logDriver": "journald",
                "memFree": 2392526848,
                "memTotal": 33036296192,
                "networkBackend": "netavark",
                "networkBackendInfo": {
                    "backend": "netavark",
                    "dns": {
                        "package": "aardvark-dns-1.16.0-1.fc42.x86_64",
                        "path": "/usr/libexec/podman/aardvark-dns",
                        "version": "aardvark-dns 1.16.0"
                    },
                    "package": "netavark-1.16.1-1.fc42.x86_64",
                    "path": "/usr/libexec/podman/netavark",
                    "version": "netavark 1.16.1"
                },
                "ociRuntime": {
                    "name": "crun",
                    "package": "crun-1.24-1.fc42.x86_64",
                    "path": "/usr/bin/crun",
                    "version": "crun version 1.24\ncommit: 54693209039e5e04cbe3c8b1cd5fe2301219f0a1\nrundir: /run/user/1000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
                },
                "os": "linux",
                "pasta": {
                    "executable": "/usr/sbin/pasta",
                    "package": "passt-0^20250911.g6cbcccc-1.fc42.x86_64",
                    "version": "pasta 0^20250911.g6cbcccc-1.fc42.x86_64\nCopyright Red Hat\nGNU General Public License, version 2 or later\n  <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n"
                },
                "remoteSocket": {
                    "exists": true,
                    "path": "/run/user/1000/podman/podman.sock"
                },
                "rootlessNetworkCmd": "pasta",
                "security": {
                    "apparmorEnabled": false,
                    "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
                    "rootless": true,
                    "seccompEnabled": true,
                    "seccompProfilePath": "/usr/share/containers/seccomp.json",
                    "selinuxEnabled": true
                },
                "serviceIsRemote": false,
                "slirp4netns": {
                    "executable": "/usr/sbin/slirp4netns",
                    "package": "slirp4netns-1.3.1-2.fc42.x86_64",
                    "version": "slirp4netns version 1.3.1\ncommit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.5.5"
                },
                "swapFree": 12884881408,
                "swapTotal": 12884893696,
                "uptime": "0h 31m 14.00s",
                "variant": ""
            },
            "plugins": {
                "authorization": null,
                "log": [
                    "k8s-file",
                    "none",
                    "passthrough",
                    "journald"
                ],
                "network": [
                    "bridge",
                    "macvlan",
                    "ipvlan"
                ],
                "volume": [
                    "local"
                ]
            },
            "registries": {
                "search": [
                    "registry.fedoraproject.org",
                    "registry.access.redhat.com",
                    "docker.io"
                ]
            },
            "store": {
                "configFile": "/var/home/atlasfoo/.config/containers/storage.conf",
                "containerStore": {
                    "number": 1,
                    "paused": 0,
                    "running": 0,
                    "stopped": 1
                },
                "graphDriverName": "overlay",
                "graphOptions": {},
                "graphRoot": "/var/home/atlasfoo/.local/share/containers/storage",
                "graphRootAllocated": 531714015232,
                "graphRootUsed": 48577605632,
                "graphStatus": {
                    "Backing Filesystem": "btrfs",
                    "Native Overlay Diff": "true",
                    "Supports d_type": "true",
                    "Supports shifting": "false",
                    "Supports volatile": "true",
                    "Using metacopy": "false"
                },
                "imageCopyTmpDir": "/var/tmp",
                "imageStore": {
                    "number": 3
                },
                "runRoot": "/run/user/1000/containers",
                "transientStore": false,
                "volumePath": "/var/home/atlasfoo/.local/share/containers/storage/volumes"
            },
            "version": {
                "APIVersion": "5.6.1",
                "BuildOrigin": "Fedora Project",
                "Built": 1756944000,
                "BuiltTime": "Wed Sep  3 18:00:00 2025",
                "GitCommit": "1e2b2315150b2ffa0971596fb5da8cd83f3ce0e1",
                "GoVersion": "go1.24.6",
                "Os": "linux",
                "OsArch": "linux/amd64",
                "Version": "5.6.1"
            }
        },
        "Name": "podman"
    },
    "Image": "quay.io/ramalama/cuda:latest",
    "Runtime": "llama.cpp",
    "Selinux": false,
    "Shortnames": {
        "Files": [
            "/var/home/atlasfoo/.local/share/ramalama/shortnames.conf"
        ],
        "Names": {
            "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
            "deepseek": "ollama://deepseek-r1",
            "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
            "gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
            "gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
            "gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
            "gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3n": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e2b": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e2b-it-f16": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-f16.gguf",
            "gemma3n:e2b-it-q8_0": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e4b": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e4b-it-f16": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-f16.gguf",
            "gemma3n:e4b-it-q8_0": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gpt-oss": "hf://ggml-org/gpt-oss-20b-GGUF",
            "gpt-oss:120b": "hf://ggml-org/gpt-oss-120b-GGUF",
            "gpt-oss:20b": "hf://ggml-org/gpt-oss-20b-GGUF",
            "granite": "ollama://granite3.1-dense",
            "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite-lab-8b": "huggingface://ibm-granite/granite-3.3-8b-instruct-GGUF/granite-3.3-8b-instruct-Q4_K_M.gguf",
            "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:2b": "ollama://granite3.1-dense:2b",
            "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:8b": "ollama://granite3.1-dense:8b",
            "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
            "ibm/granite": "ollama://granite3.1-dense:8b",
            "ibm/granite:2b": "ollama://granite3.1-dense:2b",
            "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "ibm/granite:8b": "ollama://granite3.1-dense:8b",
            "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "mistral": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral:7b": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
            "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral:7b-v3": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
            "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
            "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
            "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
            "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
            "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
            "qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
            "qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
            "qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
            "smollm:135m": "hf://HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF",
            "smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
            "smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
            "smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "tiny": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
            "tinyllama": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
        }
    },
    "Store": "/var/home/atlasfoo/.local/share/ramalama",
    "UseContainer": true,
    "Version": "0.12.2"
}

Upstream Latest Release

Yes

Additional environment details

OS: Bluefin DX Latest (Fedora 42) Container engine: podman Ramalama image: quay.io/ramalama/cuda GPU: NVIDIA RTX 5070 Ti Mobile

Additional information

No response

atlasfoo avatar Sep 19 '25 04:09 atlasfoo

It seems that in version 0.12.4, with the upgrade of ramalama image issue will be solved, but no, it keeps failing with the same error.

Anything has to be changed on my side to enable the running of vllm?

atlasfoo avatar Oct 09 '25 02:10 atlasfoo

Still an issue in the latest version.

2025-10-31 06:46:20 - DEBUG - run_cmd: podman inspect quay.io/ramalama/rocm:0.13 DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/rocm:0.13 2025-10-31 06:46:20 - DEBUG - Working directory: None DEBUG:ramalama:Working directory: None 2025-10-31 06:46:20 - DEBUG - Ignore stderr: False DEBUG:ramalama:Ignore stderr: False 2025-10-31 06:46:20 - DEBUG - Ignore all: True DEBUG:ramalama:Ignore all: True 2025-10-31 06:46:20 - DEBUG - run_cmd: podman inspect quay.io/ramalama/rocm:0.13 DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/rocm:0.13 2025-10-31 06:46:20 - DEBUG - Working directory: None DEBUG:ramalama:Working directory: None 2025-10-31 06:46:20 - DEBUG - Ignore stderr: False DEBUG:ramalama:Ignore stderr: False 2025-10-31 06:46:20 - DEBUG - Ignore all: True DEBUG:ramalama:Ignore all: True 2025-10-31 06:46:20 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --label ai.ramalama --name ramalama_yVX63Dhkng --env=HOME=/tmp --init --mount=type=bind,src=/srv/ai-data/shared/ramalama/containers/store/huggingface/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf/blobs/sha256-1270d22c0fbb3d092fb725d4d96c457b7b687a5f5a715abe1e818da303e562b6,destination=/mnt/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf,ro --mount=type=bind,src=/srv/ai-data/shared/ramalama/containers/store/huggingface/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf/blobs/sha256-26a59556925c987317ce5291811ba3b7f32ec4c647c400c6cc7e3a9993007ba7,destination=/mnt/models/chat_template_extracted,ro --mount=type=bind,src=/srv/ai-data/shared/ramalama/containers/store/huggingface/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf/blobs/sha256-f4c40979f1f766dadf02add556a0dc41e7a89ce582cc833cf146bc2f08b84c71,destination=/mnt/models/chat_template_converted,ro quay.io/ramalama/rocm:latest "/opt/venv/bin/python3 -m vllm.entrypoints.openai.api_server" --model /mnt/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf --max_model_len 2048 --port 8080 DEBUG:ramalama:exec_cmd: podman run --rm --label ai.ramalama.model=hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --label ai.ramalama --name ramalama_yVX63Dhkng --env=HOME=/tmp --init --mount=type=bind,src=/srv/ai-data/shared/ramalama/containers/store/huggingface/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf/blobs/sha256-1270d22c0fbb3d092fb725d4d96c457b7b687a5f5a715abe1e818da303e562b6,destination=/mnt/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf,ro --mount=type=bind,src=/srv/ai-data/shared/ramalama/containers/store/huggingface/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf/blobs/sha256-26a59556925c987317ce5291811ba3b7f32ec4c647c400c6cc7e3a9993007ba7,destination=/mnt/models/chat_template_extracted,ro --mount=type=bind,src=/srv/ai-data/shared/ramalama/containers/store/huggingface/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf/blobs/sha256-f4c40979f1f766dadf02add556a0dc41e7a89ce582cc833cf146bc2f08b84c71,destination=/mnt/models/chat_template_converted,ro quay.io/ramalama/rocm:latest "/opt/venv/bin/python3 -m vllm.entrypoints.openai.api_server" --model /mnt/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf --max_model_len 2048 --port 8080 ERROR (catatonit:2): failed to exec pid1: No such file or directory

RobVor avatar Oct 31 '25 06:10 RobVor

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Dec 01 '25 00:12 github-actions[bot]

@olliewalsh Any chance this is fixed with latest changes for vllm?

rhatdan avatar Dec 16 '25 12:12 rhatdan

@olliewalsh Any chance this is fixed with latest changes for vllm?

yes, I'll create a PR today/tomorrow

olliewalsh avatar Dec 16 '25 13:12 olliewalsh