ramalama ramalama serve crashes when using --rag

Issue Description

I have tryied using ramalama serve with a local RAG, but it keeps crashing a few seconds after the container starts. I can confirm that ramalama run works perfectly well with the same local RAG.

Additional note: After starting ramalama for the first time, it downloads quay.io/ramalama/rocm-rag, which is not the correct implementation for my hardware, so I'm using quay.io/ramalama/ramalama-rag instead. I have an AMD Radeon 860M integrated GPU, and running the container with ramalama-rag spins off the GPU whereas rocm-rag uses pure CPU.

Steps to reproduce the issue

ramalama --debug serve -n model -d --image quay.io/ramalama/ramalama-rag -p 8080 --rag localhost/optus-guideline ollama://library/granite4:tiny-h

Describe the results you received

2025-10-07 21:02:48 - DEBUG - run_cmd: podman inspect quay.io/ramalama/rocm:0.12
2025-10-07 21:02:48 - DEBUG - Working directory: None
2025-10-07 21:02:48 - DEBUG - Ignore stderr: False
2025-10-07 21:02:48 - DEBUG - Ignore all: True
2025-10-07 21:02:48 - DEBUG - run_cmd: podman image inspect localhost/optus-guideline
2025-10-07 21:02:48 - DEBUG - Working directory: None
2025-10-07 21:02:48 - DEBUG - Ignore stderr: False
2025-10-07 21:02:48 - DEBUG - Ignore all: False
2025-10-07 21:02:48 - DEBUG - Command finished with return code: 0
2025-10-07 21:02:48 - DEBUG - run_cmd: podman inspect quay.io/ramalama/rocm:0.12
2025-10-07 21:02:48 - DEBUG - Working directory: None
2025-10-07 21:02:48 - DEBUG - Ignore stderr: False
2025-10-07 21:02:48 - DEBUG - Ignore all: True
2025-10-07 21:02:48 - DEBUG - Checking if 8080 is available
2025-10-07 21:02:48 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=ollama://library/granite4:tiny-h --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --device /dev/dri --device /dev/kfd --device /dev/accel -e HIP_VISIBLE_DEVICES=0 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --mount=type=image,source=localhost/optus-guideline,destination=/rag,rw=true -d --label ai.ramalama --name model --env=HOME=/tmp --init --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495,destination=/mnt/models/granite4,ro --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-201bce49a1b69186622ed68f476cebc9bc390809ad6aed44665d06020e3a6667,destination=/mnt/models/config.json,ro --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-9fa3d9413163cdef6c6c0bc6ed0ccf152021bfaabe5f526b2f4b25f01c2db84b,destination=/mnt/models/chat_template,ro --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-d238cb4f5005797f286757089c137bd3853c2ef68045d758ebf59f732d8dc512,destination=/mnt/models/chat_template_converted,ro quay.io/ramalama/ramalama-rag bash -c "nohup llama-server --port 8080 --model /mnt/models/granite4 --no-warmup --jinja --chat-template-file /mnt/models/chat_template_converted --log-colors on --alias library/granite4:tiny-h --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 8 --host 0.0.0.0 &> /tmp/llama-server.log & rag_framework run /rag/vector.db"
74b105f224e80163642e2bc99dc538e6019f0c811eeb8db697215ba048cff60c

The container starts, and after a few secods stops.

Describe the results you expected

The container should remain active.

ramalama info output

{
    "Accelerator": "hip",
    "Config": {
        "settings": {
            "config_files": [
                "/usr/share/ramalama/ramalama.conf"
            ]
        }
    },
    "Engine": {
        "Info": {
            "host": {
                "arch": "amd64",
                "buildahVersion": "1.41.5",
                "cgroupControllers": [
                    "cpu",
                    "io",
                    "memory",
                    "pids"
                ],
                "cgroupManager": "systemd",
                "cgroupVersion": "v2",
                "conmon": {
                    "package": "conmon-2.1.13-1.fc42.x86_64",
                    "path": "/usr/bin/conmon",
                    "version": "conmon version 2.1.13, commit: "
                },
                "cpuUtilization": {
                    "idlePercent": 97.6,
                    "systemPercent": 0.42,
                    "userPercent": 1.98
                },
                "cpus": 16,
                "databaseBackend": "sqlite",
                "distribution": {
                    "distribution": "fedora",
                    "version": "42"
                },
                "emulatedArchitectures": [
                    "linux/arm",
                    "linux/arm64",
                    "linux/arm64be",
                    "linux/loong64",
                    "linux/mips",
                    "linux/mips64",
                    "linux/ppc",
                    "linux/ppc64",
                    "linux/ppc64le",
                    "linux/riscv32",
                    "linux/riscv64",
                    "linux/s390x"
                ],
                "eventLogger": "journald",
                "freeLocks": 2047,
                "hostname": "io",
                "idMappings": {
                    "gidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 1000001,
                            "size": 65536
                        }
                    ],
                    "uidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 1000001,
                            "size": 65536
                        }
                    ]
                },
                "kernel": "6.16.9-200.fc42.x86_64",
                "linkmode": "dynamic",
                "logDriver": "journald",
                "memFree": 12094332928,
                "memTotal": 32884342784,
                "networkBackend": "netavark",
                "networkBackendInfo": {
                    "backend": "netavark",
                    "dns": {
                        "package": "aardvark-dns-1.16.0-1.fc42.x86_64",
                        "path": "/usr/libexec/podman/aardvark-dns",
                        "version": "aardvark-dns 1.16.0"
                    },
                    "package": "netavark-1.16.1-1.fc42.x86_64",
                    "path": "/usr/libexec/podman/netavark",
                    "version": "netavark 1.16.1"
                },
                "ociRuntime": {
                    "name": "crun",
                    "package": "crun-1.24-1.fc42.x86_64",
                    "path": "/usr/bin/crun",
                    "version": "crun version 1.24\ncommit: 54693209039e5e04cbe3c8b1cd5fe2301219f0a1\nrundir: /run/user/1000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
                },
                "os": "linux",
                "pasta": {
                    "executable": "/usr/bin/pasta",
                    "package": "passt-0^20250919.g623dbf6-1.fc42.x86_64",
                    "version": "pasta 0^20250919.g623dbf6-1.fc42.x86_64\nCopyright Red Hat\nGNU General Public License, version 2 or later\n  <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n"
                },
                "remoteSocket": {
                    "exists": true,
                    "path": "/run/user/1000/podman/podman.sock"
                },
                "rootlessNetworkCmd": "pasta",
                "security": {
                    "apparmorEnabled": false,
                    "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
                    "rootless": true,
                    "seccompEnabled": true,
                    "seccompProfilePath": "/usr/share/containers/seccomp.json",
                    "selinuxEnabled": true
                },
                "serviceIsRemote": false,
                "slirp4netns": {
                    "executable": "",
                    "package": "",
                    "version": ""
                },
                "swapFree": 8589877248,
                "swapTotal": 8589930496,
                "uptime": "58h 28m 3.00s (Approximately 2.42 days)",
                "variant": ""
            },
            "plugins": {
                "authorization": null,
                "log": [
                    "k8s-file",
                    "none",
                    "passthrough",
                    "journald"
                ],
                "network": [
                    "bridge",
                    "macvlan",
                    "ipvlan"
                ],
                "volume": [
                    "local"
                ]
            },
            "registries": {
                "search": [
                    "registry.fedoraproject.org",
                    "registry.access.redhat.com",
                    "docker.io"
                ]
            },
            "store": {
                "configFile": "/var/home/daniel/.config/containers/storage.conf",
                "containerStore": {
                    "number": 1,
                    "paused": 0,
                    "running": 1,
                    "stopped": 0
                },
                "graphDriverName": "overlay",
                "graphOptions": {},
                "graphRoot": "/home/daniel/.local/share/containers/storage",
                "graphRootAllocated": 429476806656,
                "graphRootUsed": 121060868096,
                "graphStatus": {
                    "Backing Filesystem": "btrfs",
                    "Native Overlay Diff": "true",
                    "Supports d_type": "true",
                    "Supports shifting": "false",
                    "Supports volatile": "true",
                    "Using metacopy": "false"
                },
                "imageCopyTmpDir": "/var/tmp",
                "imageStore": {
                    "number": 9
                },
                "runRoot": "/run/user/1000/containers",
                "transientStore": false,
                "volumePath": "/var/home/daniel/.local/share/containers/storage/volumes"
            },
            "version": {
                "APIVersion": "5.6.2",
                "BuildOrigin": "Fedora Project",
                "Built": 1759190400,
                "BuiltTime": "Tue Sep 30 10:00:00 2025",
                "GitCommit": "9dd5e1ed33830612bc200d7a13db00af6ab865a4",
                "GoVersion": "go1.24.7",
                "Os": "linux",
                "OsArch": "linux/amd64",
                "Version": "5.6.2"
            }
        },
        "Name": "podman"
    },
    "Image": "quay.io/ramalama/rocm:latest",
    "Runtime": "llama.cpp",
    "Selinux": false,
    "Shortnames": {
        "Files": [
            "/usr/share/ramalama/shortnames.conf"
        ],
        "Names": {
            "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
            "deepseek": "ollama://deepseek-r1",
            "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
            "gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
            "gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
            "gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
            "gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3n": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e2b": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e2b-it-f16": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-f16.gguf",
            "gemma3n:e2b-it-q8_0": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e4b": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e4b-it-f16": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-f16.gguf",
            "gemma3n:e4b-it-q8_0": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gpt-oss": "hf://ggml-org/gpt-oss-20b-GGUF",
            "gpt-oss:120b": "hf://ggml-org/gpt-oss-120b-GGUF",
            "gpt-oss:20b": "hf://ggml-org/gpt-oss-20b-GGUF",
            "granite": "ollama://granite3.1-dense",
            "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite-lab-8b": "huggingface://ibm-granite/granite-3.3-8b-instruct-GGUF/granite-3.3-8b-instruct-Q4_K_M.gguf",
            "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:2b": "ollama://granite3.1-dense:2b",
            "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:8b": "ollama://granite3.1-dense:8b",
            "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
            "ibm/granite": "ollama://granite3.1-dense:8b",
            "ibm/granite:2b": "ollama://granite3.1-dense:2b",
            "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "ibm/granite:8b": "ollama://granite3.1-dense:8b",
            "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "mistral": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral:7b": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
            "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral:7b-v3": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
            "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
            "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
            "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
            "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
            "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
            "qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
            "qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
            "qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
            "smollm:135m": "hf://HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF",
            "smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
            "smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
            "smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "tiny": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
            "tinyllama": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
        }
    },
    "Store": "/home/daniel/.local/share/ramalama",
    "UseContainer": true,
    "Version": "0.12.2"
}