ramalama serve crashes when using --rag
Issue Description
I have tryied using ramalama serve with a local RAG, but it keeps crashing a few seconds after the container starts.
I can confirm that ramalama run works perfectly well with the same local RAG.
Additional note:
After starting ramalama for the first time, it downloads quay.io/ramalama/rocm-rag, which is not the correct implementation for my hardware, so I'm using quay.io/ramalama/ramalama-rag instead. I have an AMD Radeon 860M integrated GPU, and running the container with ramalama-rag spins off the GPU whereas rocm-rag uses pure CPU.
Steps to reproduce the issue
ramalama --debug serve -n model -d --image quay.io/ramalama/ramalama-rag -p 8080 --rag localhost/optus-guideline ollama://library/granite4:tiny-h
Describe the results you received
2025-10-07 21:02:48 - DEBUG - run_cmd: podman inspect quay.io/ramalama/rocm:0.12
2025-10-07 21:02:48 - DEBUG - Working directory: None
2025-10-07 21:02:48 - DEBUG - Ignore stderr: False
2025-10-07 21:02:48 - DEBUG - Ignore all: True
2025-10-07 21:02:48 - DEBUG - run_cmd: podman image inspect localhost/optus-guideline
2025-10-07 21:02:48 - DEBUG - Working directory: None
2025-10-07 21:02:48 - DEBUG - Ignore stderr: False
2025-10-07 21:02:48 - DEBUG - Ignore all: False
2025-10-07 21:02:48 - DEBUG - Command finished with return code: 0
2025-10-07 21:02:48 - DEBUG - run_cmd: podman inspect quay.io/ramalama/rocm:0.12
2025-10-07 21:02:48 - DEBUG - Working directory: None
2025-10-07 21:02:48 - DEBUG - Ignore stderr: False
2025-10-07 21:02:48 - DEBUG - Ignore all: True
2025-10-07 21:02:48 - DEBUG - Checking if 8080 is available
2025-10-07 21:02:48 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=ollama://library/granite4:tiny-h --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --device /dev/dri --device /dev/kfd --device /dev/accel -e HIP_VISIBLE_DEVICES=0 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --mount=type=image,source=localhost/optus-guideline,destination=/rag,rw=true -d --label ai.ramalama --name model --env=HOME=/tmp --init --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495,destination=/mnt/models/granite4,ro --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-201bce49a1b69186622ed68f476cebc9bc390809ad6aed44665d06020e3a6667,destination=/mnt/models/config.json,ro --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-9fa3d9413163cdef6c6c0bc6ed0ccf152021bfaabe5f526b2f4b25f01c2db84b,destination=/mnt/models/chat_template,ro --mount=type=bind,src=/home/daniel/.local/share/ramalama/store/ollama/library/granite4/blobs/sha256-d238cb4f5005797f286757089c137bd3853c2ef68045d758ebf59f732d8dc512,destination=/mnt/models/chat_template_converted,ro quay.io/ramalama/ramalama-rag bash -c "nohup llama-server --port 8080 --model /mnt/models/granite4 --no-warmup --jinja --chat-template-file /mnt/models/chat_template_converted --log-colors on --alias library/granite4:tiny-h --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 8 --host 0.0.0.0 &> /tmp/llama-server.log & rag_framework run /rag/vector.db"
74b105f224e80163642e2bc99dc538e6019f0c811eeb8db697215ba048cff60c
The container starts, and after a few secods stops.
Describe the results you expected
The container should remain active.
ramalama info output
{
"Accelerator": "hip",
"Config": {
"settings": {
"config_files": [
"/usr/share/ramalama/ramalama.conf"
]
}
},
"Engine": {
"Info": {
"host": {
"arch": "amd64",
"buildahVersion": "1.41.5",
"cgroupControllers": [
"cpu",
"io",
"memory",
"pids"
],
"cgroupManager": "systemd",
"cgroupVersion": "v2",
"conmon": {
"package": "conmon-2.1.13-1.fc42.x86_64",
"path": "/usr/bin/conmon",
"version": "conmon version 2.1.13, commit: "
},
"cpuUtilization": {
"idlePercent": 97.6,
"systemPercent": 0.42,
"userPercent": 1.98
},
"cpus": 16,
"databaseBackend": "sqlite",
"distribution": {
"distribution": "fedora",
"version": "42"
},
"emulatedArchitectures": [
"linux/arm",
"linux/arm64",
"linux/arm64be",
"linux/loong64",
"linux/mips",
"linux/mips64",
"linux/ppc",
"linux/ppc64",
"linux/ppc64le",
"linux/riscv32",
"linux/riscv64",
"linux/s390x"
],
"eventLogger": "journald",
"freeLocks": 2047,
"hostname": "io",
"idMappings": {
"gidmap": [
{
"container_id": 0,
"host_id": 1000,
"size": 1
},
{
"container_id": 1,
"host_id": 1000001,
"size": 65536
}
],
"uidmap": [
{
"container_id": 0,
"host_id": 1000,
"size": 1
},
{
"container_id": 1,
"host_id": 1000001,
"size": 65536
}
]
},
"kernel": "6.16.9-200.fc42.x86_64",
"linkmode": "dynamic",
"logDriver": "journald",
"memFree": 12094332928,
"memTotal": 32884342784,
"networkBackend": "netavark",
"networkBackendInfo": {
"backend": "netavark",
"dns": {
"package": "aardvark-dns-1.16.0-1.fc42.x86_64",
"path": "/usr/libexec/podman/aardvark-dns",
"version": "aardvark-dns 1.16.0"
},
"package": "netavark-1.16.1-1.fc42.x86_64",
"path": "/usr/libexec/podman/netavark",
"version": "netavark 1.16.1"
},
"ociRuntime": {
"name": "crun",
"package": "crun-1.24-1.fc42.x86_64",
"path": "/usr/bin/crun",
"version": "crun version 1.24\ncommit: 54693209039e5e04cbe3c8b1cd5fe2301219f0a1\nrundir: /run/user/1000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
},
"os": "linux",
"pasta": {
"executable": "/usr/bin/pasta",
"package": "passt-0^20250919.g623dbf6-1.fc42.x86_64",
"version": "pasta 0^20250919.g623dbf6-1.fc42.x86_64\nCopyright Red Hat\nGNU General Public License, version 2 or later\n <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n"
},
"remoteSocket": {
"exists": true,
"path": "/run/user/1000/podman/podman.sock"
},
"rootlessNetworkCmd": "pasta",
"security": {
"apparmorEnabled": false,
"capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
"rootless": true,
"seccompEnabled": true,
"seccompProfilePath": "/usr/share/containers/seccomp.json",
"selinuxEnabled": true
},
"serviceIsRemote": false,
"slirp4netns": {
"executable": "",
"package": "",
"version": ""
},
"swapFree": 8589877248,
"swapTotal": 8589930496,
"uptime": "58h 28m 3.00s (Approximately 2.42 days)",
"variant": ""
},
"plugins": {
"authorization": null,
"log": [
"k8s-file",
"none",
"passthrough",
"journald"
],
"network": [
"bridge",
"macvlan",
"ipvlan"
],
"volume": [
"local"
]
},
"registries": {
"search": [
"registry.fedoraproject.org",
"registry.access.redhat.com",
"docker.io"
]
},
"store": {
"configFile": "/var/home/daniel/.config/containers/storage.conf",
"containerStore": {
"number": 1,
"paused": 0,
"running": 1,
"stopped": 0
},
"graphDriverName": "overlay",
"graphOptions": {},
"graphRoot": "/home/daniel/.local/share/containers/storage",
"graphRootAllocated": 429476806656,
"graphRootUsed": 121060868096,
"graphStatus": {
"Backing Filesystem": "btrfs",
"Native Overlay Diff": "true",
"Supports d_type": "true",
"Supports shifting": "false",
"Supports volatile": "true",
"Using metacopy": "false"
},
"imageCopyTmpDir": "/var/tmp",
"imageStore": {
"number": 9
},
"runRoot": "/run/user/1000/containers",
"transientStore": false,
"volumePath": "/var/home/daniel/.local/share/containers/storage/volumes"
},
"version": {
"APIVersion": "5.6.2",
"BuildOrigin": "Fedora Project",
"Built": 1759190400,
"BuiltTime": "Tue Sep 30 10:00:00 2025",
"GitCommit": "9dd5e1ed33830612bc200d7a13db00af6ab865a4",
"GoVersion": "go1.24.7",
"Os": "linux",
"OsArch": "linux/amd64",
"Version": "5.6.2"
}
},
"Name": "podman"
},
"Image": "quay.io/ramalama/rocm:latest",
"Runtime": "llama.cpp",
"Selinux": false,
"Shortnames": {
"Files": [
"/usr/share/ramalama/shortnames.conf"
],
"Names": {
"cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
"deepseek": "ollama://deepseek-r1",
"dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
"gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
"gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
"gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
"gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
"gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
"gemma3n": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
"gemma3n:e2b": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
"gemma3n:e2b-it-f16": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-f16.gguf",
"gemma3n:e2b-it-q8_0": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
"gemma3n:e4b": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
"gemma3n:e4b-it-f16": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-f16.gguf",
"gemma3n:e4b-it-q8_0": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
"gpt-oss": "hf://ggml-org/gpt-oss-20b-GGUF",
"gpt-oss:120b": "hf://ggml-org/gpt-oss-120b-GGUF",
"gpt-oss:20b": "hf://ggml-org/gpt-oss-20b-GGUF",
"granite": "ollama://granite3.1-dense",
"granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite-lab-8b": "huggingface://ibm-granite/granite-3.3-8b-instruct-GGUF/granite-3.3-8b-instruct-Q4_K_M.gguf",
"granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:2b": "ollama://granite3.1-dense:2b",
"granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:8b": "ollama://granite3.1-dense:8b",
"hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"ibm/granite": "ollama://granite3.1-dense:8b",
"ibm/granite:2b": "ollama://granite3.1-dense:2b",
"ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"ibm/granite:8b": "ollama://granite3.1-dense:8b",
"merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"mistral": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
"mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
"mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
"mistral:7b": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
"mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
"mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v3": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
"mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
"openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
"phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
"qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
"qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
"qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
"qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
"qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
"smollm:135m": "hf://HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF",
"smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
"smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
"smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
"smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
"tiny": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
"tinyllama": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
}
},
"Store": "/home/daniel/.local/share/ramalama",
"UseContainer": true,
"Version": "0.12.2"
}
Upstream Latest Release
Yes
Additional environment details
Fedora 42
Additional information
The content of /tmp/llama-server.log inside the container:
Following.
A friendly reminder that this issue had no activity for 30 days.
This is important, so lets keep it open.