Ramalama serve with RAG failed to start after 10 seconds
Issue Description
Hello. I have been able to experiment a lot with ramalama, nd now I'm trying to explore the RAG features of this amazing project.
After running:
ramalama rag <folder of file to rag> rag_oci_image
i tried to serve tiny model with rag argument to include the vectordb in the oci image generated with in the previous step.
ramalama --debug serve --generate compose --rag rag_oci_image tiny
the command is failing stating it can start llama.cpp in 10 seconds.
Steps to reproduce the issue
ramalama --debug serve --generate compose --rag rag_oci_image tiny
Describe the results you received
ERROR: Application startup failed. Exiting. No server responding at host.containers.internal:8082, retrying for up to 10 seconds... Error: llama-server at host.containers.internal:8082 did not become ready after 10 seconds.
Describe the results you expected
To have llama.ccp running
ramalama info output
{
"Accelerator": "none",
"Config": {
"settings": {
"config_files": [
"/root/.local/share/uv/tools/ramalama/share/ramalama/ramalama.conf"
]
}
},
"Engine": {
"Info": {
"host": {
"arch": "amd64",
"buildahVersion": "1.39.3",
"cgroupControllers": [
"cpuset",
"cpu",
"io",
"memory",
"hugetlb",
"pids",
"rdma",
"misc"
],
"cgroupManager": "systemd",
"cgroupVersion": "v2",
"conmon": {
"package": "conmon_2.1.12-4_amd64",
"path": "/usr/bin/conmon",
"version": "conmon version 2.1.12, commit: unknown"
},
"cpuUtilization": {
"idlePercent": 97.51,
"systemPercent": 0.09,
"userPercent": 2.4
},
"cpus": 4,
"databaseBackend": "sqlite",
"distribution": {
"codename": "trixie",
"distribution": "debian",
"version": "13"
},
"eventLogger": "journald",
"freeLocks": 2047,
"hostname": "debhome",
"idMappings": {
"gidmap": null,
"uidmap": null
},
"kernel": "6.12.41+deb13-amd64",
"linkmode": "dynamic",
"logDriver": "journald",
"memFree": 9736966144,
"memTotal": 16507269120,
"networkBackend": "netavark",
"networkBackendInfo": {
"backend": "netavark",
"dns": {
"package": "aardvark-dns_1.14.0-3_amd64",
"path": "/usr/lib/podman/aardvark-dns",
"version": "aardvark-dns 1.14.0"
},
"package": "netavark_1.14.0-2_amd64",
"path": "/usr/lib/podman/netavark",
"version": "netavark 1.14.0"
},
"ociRuntime": {
"name": "runc",
"package": "containerd.io_1.7.29-1~debian.13~trixie_amd64",
"path": "/usr/bin/runc",
"version": "runc version 1.3.3\ncommit: v1.3.3-0-gd842d771\nspec: 1.2.1\ngo: go1.24.9\nlibseccomp: 2.6.0"
},
"os": "linux",
"pasta": {
"executable": "/usr/bin/pasta",
"package": "passt_0.0~git20250503.587980c-2_amd64",
"version": ""
},
"remoteSocket": {
"exists": true,
"path": "/run/podman/podman.sock"
},
"rootlessNetworkCmd": "pasta",
"security": {
"apparmorEnabled": true,
"capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
"rootless": false,
"seccompEnabled": true,
"seccompProfilePath": "/usr/share/containers/seccomp.json",
"selinuxEnabled": false
},
"serviceIsRemote": false,
"slirp4netns": {
"executable": "/usr/bin/slirp4netns",
"package": "slirp4netns_1.2.1-1.1_amd64",
"version": "slirp4netns version 1.2.1\ncommit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.6.0"
},
"swapFree": 9949003776,
"swapTotal": 10000265216,
"uptime": "315h 28m 57.00s (Approximately 13.12 days)",
"variant": ""
},
"plugins": {
"authorization": null,
"log": [
"k8s-file",
"none",
"passthrough",
"journald"
],
"network": [
"bridge",
"macvlan",
"ipvlan"
],
"volume": [
"local"
]
},
"registries": {},
"store": {
"configFile": "/usr/share/containers/storage.conf",
"containerStore": {
"number": 1,
"paused": 0,
"running": 1,
"stopped": 0
},
"graphDriverName": "overlay",
"graphOptions": {
"overlay.mountopt": "nodev"
},
"graphRoot": "/var/lib/containers/storage",
"graphRootAllocated": 491001659392,
"graphRootUsed": 125398740992,
"graphStatus": {
"Backing Filesystem": "extfs",
"Native Overlay Diff": "true",
"Supports d_type": "true",
"Supports shifting": "true",
"Supports volatile": "true",
"Using metacopy": "false"
},
"imageCopyTmpDir": "/var/tmp",
"imageStore": {
"number": 5
},
"runRoot": "/run/containers/storage",
"transientStore": false,
"volumePath": "/var/lib/containers/storage/volumes"
},
"version": {
"APIVersion": "5.4.2",
"BuildOrigin": "Debian",
"Built": 1753478586,
"BuiltTime": "Fri Jul 25 23:23:06 2025",
"GitCommit": "",
"GoVersion": "go1.24.4",
"Os": "linux",
"OsArch": "linux/amd64",
"Version": "5.4.2"
}
},
"Name": "podman"
},
"Image": "quay.io/ramalama/ramalama:latest",
"Inference": {
"Default": "llama.cpp",
"Engines": {
"llama.cpp": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/llama.cpp.yaml",
"mlx": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/mlx.yaml",
"schema.1-0-0": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/schema.1-0-0.json",
"vllm": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/vllm.yaml"
},
"Schema": {
"1-0-0": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/schema.1-0-0.json"
}
},
"RagImage": "quay.io/ramalama/ramalama-rag:latest",
"Selinux": false,
"Shortnames": {
"Files": [
"/root/.local/share/uv/tools/ramalama/share/ramalama/shortnames.conf"
],
"Names": {
"cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
"deepseek": "ollama://deepseek-r1",
"dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
"gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
"gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
"gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
"gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
"gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
"gemma3n": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
"gemma3n:e2b": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
"gemma3n:e2b-it-f16": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-f16.gguf",
"gemma3n:e2b-it-q8_0": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
"gemma3n:e4b": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
"gemma3n:e4b-it-f16": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-f16.gguf",
"gemma3n:e4b-it-q8_0": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
"gpt-oss": "hf://ggml-org/gpt-oss-20b-GGUF",
"gpt-oss:120b": "hf://ggml-org/gpt-oss-120b-GGUF",
"gpt-oss:20b": "hf://ggml-org/gpt-oss-20b-GGUF",
"granite": "ollama://granite3.1-dense",
"granite-be-3.0:1b": "hf://taronaeo/Granite-3.0-1B-A400M-Instruct-BE-GGUF/granite-3.0-1b-a400m-instruct-be.Q2_K.gguf",
"granite-be-3.3:2b": "hf://taronaeo/Granite-3.3-2B-Instruct-BE-GGUF/granite-3.3-2b-instruct-be.Q4_K_M.gguf",
"granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite-lab-8b": "huggingface://ibm-granite/granite-3.3-8b-instruct-GGUF/granite-3.3-8b-instruct-Q4_K_M.gguf",
"granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:2b": "ollama://granite3.1-dense:2b",
"granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:8b": "ollama://granite3.1-dense:8b",
"hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"ibm/granite": "ollama://granite3.1-dense:8b",
"ibm/granite:2b": "ollama://granite3.1-dense:2b",
"ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"ibm/granite:8b": "ollama://granite3.1-dense:8b",
"merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"mistral": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
"mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
"mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
"mistral:7b": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
"mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
"mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v3": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
"mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
"openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
"phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
"qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
"qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
"qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
"qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
"qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
"smollm:135m": "hf://HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF",
"smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
"smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
"smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
"smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
"stories-be:260k": "hf://taronaeo/tinyllamas-BE/stories260K-be.gguf",
"tiny": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
"tinyllama": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
}
},
"Store": "/var/lib/ramalama",
"UseContainer": true,
"Version": "0.14.0"
}
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
I found that the message came from this python script. https://github.com/containers/ramalama/blob/main/container-images/scripts/rag_framework#L119 However I'm not able to understand in whic part of the process the llama.cpp server is being launched.
Output of ramalama --debug serve
root@host:/home# ramalama --debug serve tiny --rag rag2
2025-11-10 22:15:52 - DEBUG - Checking if 8080 is available
DEBUG:ramalama:Checking if 8080 is available
2025-11-10 22:15:52 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - Checking if 8081 is available
DEBUG:ramalama:Checking if 8081 is available
2025-11-10 22:15:52 - DEBUG - Checking if 8085 is available
DEBUG:ramalama:Checking if 8085 is available
2025-11-10 22:15:52 - DEBUG - Checking if 8082 is available
DEBUG:ramalama:Checking if 8082 is available
2025-11-10 22:15:52 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman image inspect rag2
DEBUG:ramalama:run_cmd: podman image inspect rag2
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - Command finished with return code: 0
DEBUG:ramalama:Command finished with return code: 0
2025-11-10 22:15:53 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8082 --label ai.ramalama.command=serve --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -d --device /dev/dri -p 8082:8082 --label ai.ramalama --name ramalama_3WtMydOWDs --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-030a469a63576d59f601ef5608846b7718eaa884dd820e9aa7493efec1788afa,destination=/mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-587cb980af76fdc7e52369fd0b9d926dff266976b6f8ac631e358fecc49ff8cf,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-66291cf0045c2425a3a667cf3cbb7af2b11f09e025c02f97245323ab79119362,destination=/mnt/models/chat_template_extracted,ro quay.io/ramalama/ramalama:latest llama-server --host 0.0.0.0 --port 8082 --model /mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf --jinja --no-warmup --alias TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --temp 0.8 --cache-reuse 256 -v --flash-attn on -ngl 999 --threads 4 --log-colors on
DEBUG:ramalama:exec_cmd: podman run --rm --label ai.ramalama.model=hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8082 --label ai.ramalama.command=serve --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -d --device /dev/dri -p 8082:8082 --label ai.ramalama --name ramalama_3WtMydOWDs --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-030a469a63576d59f601ef5608846b7718eaa884dd820e9aa7493efec1788afa,destination=/mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-587cb980af76fdc7e52369fd0b9d926dff266976b6f8ac631e358fecc49ff8cf,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-66291cf0045c2425a3a667cf3cbb7af2b11f09e025c02f97245323ab79119362,destination=/mnt/models/chat_template_extracted,ro quay.io/ramalama/ramalama:latest llama-server --host 0.0.0.0 --port 8082 --model /mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf --jinja --no-warmup --alias TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --temp 0.8 --cache-reuse 256 -v --flash-attn on -ngl 999 --threads 4 --log-colors on
f663d636c754a28e5d3387047683342f28b010eba4500fa02e68f52f3cd35032
2025-11-10 22:15:54 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:54 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:54 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:54 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:54 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:54 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:54 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:54 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:54 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:54 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:54 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=rag2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label "ai.ramalama.command=serve --rag" --label ai.ramalama.rag.image=rag2 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri -p 8080:8080 --mount=type=image,source=rag2,destination=/rag,rw=true --label ai.ramalama --name ramalama_PalNPPekcN --env=HOME=/tmp --init quay.io/ramalama/ramalama-rag:latest rag_framework --debug serve --port 8080 --model-host host.containers.internal --model-port 8082 /rag/vector.db
DEBUG:ramalama:exec_cmd: podman run --rm --label ai.ramalama.model=rag2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label "ai.ramalama.command=serve --rag" --label ai.ramalama.rag.image=rag2 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri -p 8080:8080 --mount=type=image,source=rag2,destination=/rag,rw=true --label ai.ramalama --name ramalama_PalNPPekcN --env=HOME=/tmp --init quay.io/ramalama/ramalama-rag:latest rag_framework --debug serve --port 8080 --model-host host.containers.internal --model-port 8082 /rag/vector.db
2025-11-10 21:15:58,564 asyncio DEBUG: Using selector: EpollSelector
INFO: Started server process [7]
INFO: Waiting for application startup.
ERROR: Traceback (most recent call last):
File "/opt/venv/lib64/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
async with self.lifespan_context(app) as maybe_state:
~~~~~~~~~~~~~~~~~~~~~^^^^^
File "/usr/lib64/python3.13/contextlib.py", line 214, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/bin/rag_framework", line 396, in lifespan
await wait_for_llama_server(args.model_host, args.model_port)
File "/usr/bin/rag_framework", line 139, in wait_for_llama_server
sys.exit(1)
~~~~~~~~^^^
SystemExit: 1
ERROR: Application startup failed. Exiting.
No server responding at host.containers.internal:8082, retrying for up to 10 seconds...
Error: llama-server at host.containers.internal:8082 did not become ready after 10 seconds.
This error also leve a stale container running.
I guess that the error could be related to the fact that the first container is not named host.containers.internal:
podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
89d066be844a quay.io/ramalama/ramalama:latest llama-server --ho... 13 minutes ago Up 13 minutes 0.0.0.0:8082->8082/tcp ramalama_3WtMydOWDs
@bmahabirbu PTAL
I guess that the error could be related to the fact that the first container is not named
host.containers.internal:
host.containers.internal will be the fully qualified domain name of the host this is running on.
Is the issue only when using rag? i.e does ramalama --debug serve tiny run successfully
Sure, the command without - - rag run fine. Did you need something else from my side? Some more test?
See https://github.com/containers/ramalama/issues/2079
Serve currently does work but will be working soon!
@federicofortini Could you retry with the 0.15.0 release? Also, note that you don't need the --generate argument if you want to start a local server.
Also, it looks like you may be running ramalama as root, which is unnecessary. Could you try running it as a non-root user?
Hello @mikebonnet, I'm sorry to report that the outcome is still the same.
ramalama version
ramalama version 0.15.0
Ouput:
ramalama --debug serve --rag localhost/rag2 tiny
2025-12-14 14:45:02 - DEBUG - Checking if 8080 is available
2025-12-14 14:45:02 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: True
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - Checking if 8081 is available
2025-12-14 14:45:02 - DEBUG - Checking if 8109 is available
2025-12-14 14:45:02 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: True
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: True
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman image inspect localhost/rag2
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - Command finished with return code: 0
2025-12-14 14:45:03 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: True
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8109 --label ai.ramalama.command=serve --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -d --device /dev/dri -p 8109:8109 --label ai.ramalama --name ramalama_kvhjnbOAPN --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-030a469a63576d59f601ef5608846b7718eaa884dd820e9aa7493efec1788afa,destination=/mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-587cb980af76fdc7e52369fd0b9d926dff266976b6f8ac631e358fecc49ff8cf,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-66291cf0045c2425a3a667cf3cbb7af2b11f09e025c02f97245323ab79119362,destination=/mnt/models/chat_template_extracted,ro quay.io/ramalama/ramalama:latest llama-server --host 0.0.0.0 --port 8109 --model /mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf --jinja --no-warmup --alias TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --temp 0.8 --cache-reuse 256 -v --flash-attn on -ngl 999 --threads 4 --log-colors on
a00da65aa4b19f3ac11d4c377e05923e28d54a12c55b5a7bce6f48217cf79711
2025-12-14 14:45:04 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:04 - DEBUG - Working directory: None
2025-12-14 14:45:04 - DEBUG - Ignore stderr: False
2025-12-14 14:45:04 - DEBUG - Ignore all: False
2025-12-14 14:45:04 - DEBUG - env: None
2025-12-14 14:45:04 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:04 - DEBUG - Working directory: None
2025-12-14 14:45:04 - DEBUG - Ignore stderr: False
2025-12-14 14:45:04 - DEBUG - Ignore all: False
2025-12-14 14:45:04 - DEBUG - env: None
2025-12-14 14:45:04 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=localhost/rag2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label "ai.ramalama.command=serve --rag" --label ai.ramalama.rag.image=localhost/rag2 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri -p 8080:8080 --mount=type=image,source=localhost/rag2,destination=/rag,rw=true --label ai.ramalama --name ramalama_qp1b0Htw73 --env=HOME=/tmp --init quay.io/ramalama/ramalama-rag:latest rag_framework --debug serve --port 8080 --model-host host.containers.internal --model-port 8109 /rag/vector.db
2025-12-14 13:45:07,625 asyncio DEBUG: Using selector: EpollSelector
INFO: Started server process [7]
INFO: Waiting for application startup.
ERROR: Traceback (most recent call last):
File "/opt/venv/lib64/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
async with self.lifespan_context(app) as maybe_state:
~~~~~~~~~~~~~~~~~~~~~^^^^^
File "/usr/lib64/python3.13/contextlib.py", line 214, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/bin/rag_framework", line 411, in lifespan
await wait_for_llama_server(args.model_host, args.model_port, total_timeout=120)
File "/usr/bin/rag_framework", line 144, in wait_for_llama_server
raise TimeoutError(f"LLaMA server at {host}:{port} did not become ready after {total_timeout} seconds.")
TimeoutError: LLaMA server at host.containers.internal:8109 did not become ready after 120 seconds.
ERROR: Application startup failed. Exiting.
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Hmm this is interesting try it without the localhost/ and add the tag like rag2:latest!
Also see if regular ramalama serve is working
Are u using any gpu?
@bmahabirbu if it says "MLX" it is likely on a Mac device
@bmahabirbu sorry for the delay. Yes, regular ramalama serve work flawlessly. And no, I'm not using any gpu. I work entirely with CPU. My server is standard x86 Intel N100.
I tried also without localhost, and with tag. However the results is the same. I don't think it's a gpu / CPU problem, but a network one. The first container spawning is the one that should be searched from the second one, but it's name is not correct. Maybe we can work with the name parameter of podman? I'll do some test when I'll have more spare time.
Federico
host.containers.internal is a special hostname used by podman to allow access to the host, or other rootless containers running on the same machine, it's described in the podman-run manpage.
I'm surprised by the ports it's choosing for you, when I run it it's consistently choosing ports 8080 and 8081. Do you have a lot of other local services running?