ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

Ramalama serve with RAG failed to start after 10 seconds

Open federicofortini opened this issue 2 months ago • 7 comments

Issue Description

Hello. I have been able to experiment a lot with ramalama, nd now I'm trying to explore the RAG features of this amazing project. After running: ramalama rag <folder of file to rag> rag_oci_image i tried to serve tiny model with rag argument to include the vectordb in the oci image generated with in the previous step. ramalama --debug serve --generate compose --rag rag_oci_image tiny the command is failing stating it can start llama.cpp in 10 seconds.

Steps to reproduce the issue

ramalama --debug serve --generate compose --rag rag_oci_image tiny

Describe the results you received

ERROR: Application startup failed. Exiting. No server responding at host.containers.internal:8082, retrying for up to 10 seconds... Error: llama-server at host.containers.internal:8082 did not become ready after 10 seconds.

Describe the results you expected

To have llama.ccp running

ramalama info output

{
    "Accelerator": "none",
    "Config": {
        "settings": {
            "config_files": [
                "/root/.local/share/uv/tools/ramalama/share/ramalama/ramalama.conf"
            ]
        }
    },
    "Engine": {
        "Info": {
            "host": {
                "arch": "amd64",
                "buildahVersion": "1.39.3",
                "cgroupControllers": [
                    "cpuset",
                    "cpu",
                    "io",
                    "memory",
                    "hugetlb",
                    "pids",
                    "rdma",
                    "misc"
                ],
                "cgroupManager": "systemd",
                "cgroupVersion": "v2",
                "conmon": {
                    "package": "conmon_2.1.12-4_amd64",
                    "path": "/usr/bin/conmon",
                    "version": "conmon version 2.1.12, commit: unknown"
                },
                "cpuUtilization": {
                    "idlePercent": 97.51,
                    "systemPercent": 0.09,
                    "userPercent": 2.4
                },
                "cpus": 4,
                "databaseBackend": "sqlite",
                "distribution": {
                    "codename": "trixie",
                    "distribution": "debian",
                    "version": "13"
                },
                "eventLogger": "journald",
                "freeLocks": 2047,
                "hostname": "debhome",
                "idMappings": {
                    "gidmap": null,
                    "uidmap": null
                },
                "kernel": "6.12.41+deb13-amd64",
                "linkmode": "dynamic",
                "logDriver": "journald",
                "memFree": 9736966144,
                "memTotal": 16507269120,
                "networkBackend": "netavark",
                "networkBackendInfo": {
                    "backend": "netavark",
                    "dns": {
                        "package": "aardvark-dns_1.14.0-3_amd64",
                        "path": "/usr/lib/podman/aardvark-dns",
                        "version": "aardvark-dns 1.14.0"
                    },
                    "package": "netavark_1.14.0-2_amd64",
                    "path": "/usr/lib/podman/netavark",
                    "version": "netavark 1.14.0"
                },
                "ociRuntime": {
                    "name": "runc",
                    "package": "containerd.io_1.7.29-1~debian.13~trixie_amd64",
                    "path": "/usr/bin/runc",
                    "version": "runc version 1.3.3\ncommit: v1.3.3-0-gd842d771\nspec: 1.2.1\ngo: go1.24.9\nlibseccomp: 2.6.0"
                },
                "os": "linux",
                "pasta": {
                    "executable": "/usr/bin/pasta",
                    "package": "passt_0.0~git20250503.587980c-2_amd64",
                    "version": ""
                },
                "remoteSocket": {
                    "exists": true,
                    "path": "/run/podman/podman.sock"
                },
                "rootlessNetworkCmd": "pasta",
                "security": {
                    "apparmorEnabled": true,
                    "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
                    "rootless": false,
                    "seccompEnabled": true,
                    "seccompProfilePath": "/usr/share/containers/seccomp.json",
                    "selinuxEnabled": false
                },
                "serviceIsRemote": false,
                "slirp4netns": {
                    "executable": "/usr/bin/slirp4netns",
                    "package": "slirp4netns_1.2.1-1.1_amd64",
                    "version": "slirp4netns version 1.2.1\ncommit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.6.0"
                },
                "swapFree": 9949003776,
                "swapTotal": 10000265216,
                "uptime": "315h 28m 57.00s (Approximately 13.12 days)",
                "variant": ""
            },
            "plugins": {
                "authorization": null,
                "log": [
                    "k8s-file",
                    "none",
                    "passthrough",
                    "journald"
                ],
                "network": [
                    "bridge",
                    "macvlan",
                    "ipvlan"
                ],
                "volume": [
                    "local"
                ]
            },
            "registries": {},
            "store": {
                "configFile": "/usr/share/containers/storage.conf",
                "containerStore": {
                    "number": 1,
                    "paused": 0,
                    "running": 1,
                    "stopped": 0
                },
                "graphDriverName": "overlay",
                "graphOptions": {
                    "overlay.mountopt": "nodev"
                },
                "graphRoot": "/var/lib/containers/storage",
                "graphRootAllocated": 491001659392,
                "graphRootUsed": 125398740992,
                "graphStatus": {
                    "Backing Filesystem": "extfs",
                    "Native Overlay Diff": "true",
                    "Supports d_type": "true",
                    "Supports shifting": "true",
                    "Supports volatile": "true",
                    "Using metacopy": "false"
                },
                "imageCopyTmpDir": "/var/tmp",
                "imageStore": {
                    "number": 5
                },
                "runRoot": "/run/containers/storage",
                "transientStore": false,
                "volumePath": "/var/lib/containers/storage/volumes"
            },
            "version": {
                "APIVersion": "5.4.2",
                "BuildOrigin": "Debian",
                "Built": 1753478586,
                "BuiltTime": "Fri Jul 25 23:23:06 2025",
                "GitCommit": "",
                "GoVersion": "go1.24.4",
                "Os": "linux",
                "OsArch": "linux/amd64",
                "Version": "5.4.2"
            }
        },
        "Name": "podman"
    },
    "Image": "quay.io/ramalama/ramalama:latest",
    "Inference": {
        "Default": "llama.cpp",
        "Engines": {
            "llama.cpp": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/llama.cpp.yaml",
            "mlx": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/mlx.yaml",
            "schema.1-0-0": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/schema.1-0-0.json",
            "vllm": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/vllm.yaml"
        },
        "Schema": {
            "1-0-0": "/root/.local/share/uv/tools/ramalama/share/ramalama/inference/schema.1-0-0.json"
        }
    },
    "RagImage": "quay.io/ramalama/ramalama-rag:latest",
    "Selinux": false,
    "Shortnames": {
        "Files": [
            "/root/.local/share/uv/tools/ramalama/share/ramalama/shortnames.conf"
        ],
        "Names": {
            "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
            "deepseek": "ollama://deepseek-r1",
            "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
            "gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
            "gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
            "gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
            "gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3n": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e2b": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e2b-it-f16": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-f16.gguf",
            "gemma3n:e2b-it-q8_0": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e4b": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e4b-it-f16": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-f16.gguf",
            "gemma3n:e4b-it-q8_0": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gpt-oss": "hf://ggml-org/gpt-oss-20b-GGUF",
            "gpt-oss:120b": "hf://ggml-org/gpt-oss-120b-GGUF",
            "gpt-oss:20b": "hf://ggml-org/gpt-oss-20b-GGUF",
            "granite": "ollama://granite3.1-dense",
            "granite-be-3.0:1b": "hf://taronaeo/Granite-3.0-1B-A400M-Instruct-BE-GGUF/granite-3.0-1b-a400m-instruct-be.Q2_K.gguf",
            "granite-be-3.3:2b": "hf://taronaeo/Granite-3.3-2B-Instruct-BE-GGUF/granite-3.3-2b-instruct-be.Q4_K_M.gguf",
            "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite-lab-8b": "huggingface://ibm-granite/granite-3.3-8b-instruct-GGUF/granite-3.3-8b-instruct-Q4_K_M.gguf",
            "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:2b": "ollama://granite3.1-dense:2b",
            "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:8b": "ollama://granite3.1-dense:8b",
            "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
            "ibm/granite": "ollama://granite3.1-dense:8b",
            "ibm/granite:2b": "ollama://granite3.1-dense:2b",
            "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "ibm/granite:8b": "ollama://granite3.1-dense:8b",
            "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "mistral": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral:7b": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
            "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral:7b-v3": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
            "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
            "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
            "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
            "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
            "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
            "qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
            "qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
            "qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
            "smollm:135m": "hf://HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF",
            "smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
            "smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
            "smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "stories-be:260k": "hf://taronaeo/tinyllamas-BE/stories260K-be.gguf",
            "tiny": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
            "tinyllama": "hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
        }
    },
    "Store": "/var/lib/ramalama",
    "UseContainer": true,
    "Version": "0.14.0"
}

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

I found that the message came from this python script. https://github.com/containers/ramalama/blob/main/container-images/scripts/rag_framework#L119 However I'm not able to understand in whic part of the process the llama.cpp server is being launched.

federicofortini avatar Nov 10 '25 09:11 federicofortini

Output of ramalama --debug serve

root@host:/home# ramalama --debug serve tiny --rag rag2
2025-11-10 22:15:52 - DEBUG - Checking if 8080 is available
DEBUG:ramalama:Checking if 8080 is available
2025-11-10 22:15:52 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - Checking if 8081 is available
DEBUG:ramalama:Checking if 8081 is available
2025-11-10 22:15:52 - DEBUG - Checking if 8085 is available
DEBUG:ramalama:Checking if 8085 is available
2025-11-10 22:15:52 - DEBUG - Checking if 8082 is available
DEBUG:ramalama:Checking if 8082 is available
2025-11-10 22:15:52 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:52 - DEBUG - run_cmd: podman image inspect rag2
DEBUG:ramalama:run_cmd: podman image inspect rag2
2025-11-10 22:15:52 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:52 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:52 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:52 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - Command finished with return code: 0
DEBUG:ramalama:Command finished with return code: 0
2025-11-10 22:15:53 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
DEBUG:ramalama:run_cmd: podman inspect quay.io/ramalama/ramalama:0.14
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: True
DEBUG:ramalama:Ignore all: True
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:53 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:53 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:53 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:53 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:53 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8082 --label ai.ramalama.command=serve --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -d --device /dev/dri -p 8082:8082 --label ai.ramalama --name ramalama_3WtMydOWDs --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-030a469a63576d59f601ef5608846b7718eaa884dd820e9aa7493efec1788afa,destination=/mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-587cb980af76fdc7e52369fd0b9d926dff266976b6f8ac631e358fecc49ff8cf,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-66291cf0045c2425a3a667cf3cbb7af2b11f09e025c02f97245323ab79119362,destination=/mnt/models/chat_template_extracted,ro quay.io/ramalama/ramalama:latest llama-server --host 0.0.0.0 --port 8082 --model /mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf --jinja --no-warmup --alias TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --temp 0.8 --cache-reuse 256 -v --flash-attn on -ngl 999 --threads 4 --log-colors on
DEBUG:ramalama:exec_cmd: podman run --rm --label ai.ramalama.model=hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8082 --label ai.ramalama.command=serve --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -d --device /dev/dri -p 8082:8082 --label ai.ramalama --name ramalama_3WtMydOWDs --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-030a469a63576d59f601ef5608846b7718eaa884dd820e9aa7493efec1788afa,destination=/mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-587cb980af76fdc7e52369fd0b9d926dff266976b6f8ac631e358fecc49ff8cf,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-66291cf0045c2425a3a667cf3cbb7af2b11f09e025c02f97245323ab79119362,destination=/mnt/models/chat_template_extracted,ro quay.io/ramalama/ramalama:latest llama-server --host 0.0.0.0 --port 8082 --model /mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf --jinja --no-warmup --alias TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --temp 0.8 --cache-reuse 256 -v --flash-attn on -ngl 999 --threads 4 --log-colors on
f663d636c754a28e5d3387047683342f28b010eba4500fa02e68f52f3cd35032
2025-11-10 22:15:54 - DEBUG - run_cmd: npu-smi info
DEBUG:ramalama:run_cmd: npu-smi info
2025-11-10 22:15:54 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:54 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:54 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:54 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:54 - DEBUG - run_cmd: mthreads-gmi
DEBUG:ramalama:run_cmd: mthreads-gmi
2025-11-10 22:15:54 - DEBUG - Working directory: None
DEBUG:ramalama:Working directory: None
2025-11-10 22:15:54 - DEBUG - Ignore stderr: False
DEBUG:ramalama:Ignore stderr: False
2025-11-10 22:15:54 - DEBUG - Ignore all: False
DEBUG:ramalama:Ignore all: False
2025-11-10 22:15:54 - DEBUG - env: None
DEBUG:ramalama:env: None
2025-11-10 22:15:54 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=rag2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label "ai.ramalama.command=serve --rag" --label ai.ramalama.rag.image=rag2 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri -p 8080:8080 --mount=type=image,source=rag2,destination=/rag,rw=true --label ai.ramalama --name ramalama_PalNPPekcN --env=HOME=/tmp --init quay.io/ramalama/ramalama-rag:latest rag_framework --debug serve --port 8080 --model-host host.containers.internal --model-port 8082 /rag/vector.db
DEBUG:ramalama:exec_cmd: podman run --rm --label ai.ramalama.model=rag2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label "ai.ramalama.command=serve --rag" --label ai.ramalama.rag.image=rag2 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri -p 8080:8080 --mount=type=image,source=rag2,destination=/rag,rw=true --label ai.ramalama --name ramalama_PalNPPekcN --env=HOME=/tmp --init quay.io/ramalama/ramalama-rag:latest rag_framework --debug serve --port 8080 --model-host host.containers.internal --model-port 8082 /rag/vector.db
2025-11-10 21:15:58,564 asyncio DEBUG: Using selector: EpollSelector
INFO:     Started server process [7]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/opt/venv/lib64/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
               ~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/usr/lib64/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/rag_framework", line 396, in lifespan
    await wait_for_llama_server(args.model_host, args.model_port)
  File "/usr/bin/rag_framework", line 139, in wait_for_llama_server
    sys.exit(1)
    ~~~~~~~~^^^
SystemExit: 1

ERROR:    Application startup failed. Exiting.
No server responding at host.containers.internal:8082, retrying for up to 10 seconds...
Error: llama-server at host.containers.internal:8082 did not become ready after 10 seconds.

federicofortini avatar Nov 10 '25 21:11 federicofortini

This error also leve a stale container running.

federicofortini avatar Nov 10 '25 21:11 federicofortini

I guess that the error could be related to the fact that the first container is not named host.containers.internal:

podman ps
CONTAINER ID  IMAGE                             COMMAND               CREATED         STATUS         PORTS                   NAMES
89d066be844a  quay.io/ramalama/ramalama:latest  llama-server --ho...  13 minutes ago  Up 13 minutes  0.0.0.0:8082->8082/tcp  ramalama_3WtMydOWDs

federicofortini avatar Nov 10 '25 21:11 federicofortini

@bmahabirbu PTAL

rhatdan avatar Nov 11 '25 13:11 rhatdan

I guess that the error could be related to the fact that the first container is not named host.containers.internal:

host.containers.internal will be the fully qualified domain name of the host this is running on.

Is the issue only when using rag? i.e does ramalama --debug serve tiny run successfully

olliewalsh avatar Nov 12 '25 11:11 olliewalsh

Sure, the command without - - rag run fine. Did you need something else from my side? Some more test?

federicofortini avatar Nov 12 '25 12:11 federicofortini

See https://github.com/containers/ramalama/issues/2079

Serve currently does work but will be working soon!

bmahabirbu avatar Nov 17 '25 20:11 bmahabirbu

@federicofortini Could you retry with the 0.15.0 release? Also, note that you don't need the --generate argument if you want to start a local server.

mikebonnet avatar Dec 12 '25 17:12 mikebonnet

Also, it looks like you may be running ramalama as root, which is unnecessary. Could you try running it as a non-root user?

mikebonnet avatar Dec 12 '25 17:12 mikebonnet

Hello @mikebonnet, I'm sorry to report that the outcome is still the same.

ramalama version
ramalama version 0.15.0

Ouput:

ramalama --debug serve --rag localhost/rag2 tiny
2025-12-14 14:45:02 - DEBUG - Checking if 8080 is available
2025-12-14 14:45:02 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: True
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - Checking if 8081 is available
2025-12-14 14:45:02 - DEBUG - Checking if 8109 is available
2025-12-14 14:45:02 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: True
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: True
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:02 - DEBUG - run_cmd: podman image inspect localhost/rag2
2025-12-14 14:45:02 - DEBUG - Working directory: None
2025-12-14 14:45:02 - DEBUG - Ignore stderr: False
2025-12-14 14:45:02 - DEBUG - Ignore all: False
2025-12-14 14:45:02 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - Command finished with return code: 0
2025-12-14 14:45:03 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: podman inspect quay.io/ramalama/ramalama:0.15
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: True
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:03 - DEBUG - Working directory: None
2025-12-14 14:45:03 - DEBUG - Ignore stderr: False
2025-12-14 14:45:03 - DEBUG - Ignore all: False
2025-12-14 14:45:03 - DEBUG - env: None
2025-12-14 14:45:03 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8109 --label ai.ramalama.command=serve --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -d --device /dev/dri -p 8109:8109 --label ai.ramalama --name ramalama_kvhjnbOAPN --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-030a469a63576d59f601ef5608846b7718eaa884dd820e9aa7493efec1788afa,destination=/mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-587cb980af76fdc7e52369fd0b9d926dff266976b6f8ac631e358fecc49ff8cf,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blobs/sha256-66291cf0045c2425a3a667cf3cbb7af2b11f09e025c02f97245323ab79119362,destination=/mnt/models/chat_template_extracted,ro quay.io/ramalama/ramalama:latest llama-server --host 0.0.0.0 --port 8109 --model /mnt/models/tinyllama-1.1b-chat-v1.0.Q2_K.gguf --jinja --no-warmup --alias TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --temp 0.8 --cache-reuse 256 -v --flash-attn on -ngl 999 --threads 4 --log-colors on
a00da65aa4b19f3ac11d4c377e05923e28d54a12c55b5a7bce6f48217cf79711
2025-12-14 14:45:04 - DEBUG - run_cmd: npu-smi info
2025-12-14 14:45:04 - DEBUG - Working directory: None
2025-12-14 14:45:04 - DEBUG - Ignore stderr: False
2025-12-14 14:45:04 - DEBUG - Ignore all: False
2025-12-14 14:45:04 - DEBUG - env: None
2025-12-14 14:45:04 - DEBUG - run_cmd: mthreads-gmi
2025-12-14 14:45:04 - DEBUG - Working directory: None
2025-12-14 14:45:04 - DEBUG - Ignore stderr: False
2025-12-14 14:45:04 - DEBUG - Ignore all: False
2025-12-14 14:45:04 - DEBUG - env: None
2025-12-14 14:45:04 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=localhost/rag2 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label "ai.ramalama.command=serve --rag" --label ai.ramalama.rag.image=localhost/rag2 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri -p 8080:8080 --mount=type=image,source=localhost/rag2,destination=/rag,rw=true --label ai.ramalama --name ramalama_qp1b0Htw73 --env=HOME=/tmp --init quay.io/ramalama/ramalama-rag:latest rag_framework --debug serve --port 8080 --model-host host.containers.internal --model-port 8109 /rag/vector.db
2025-12-14 13:45:07,625 asyncio DEBUG: Using selector: EpollSelector
INFO:     Started server process [7]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/opt/venv/lib64/python3.13/site-packages/starlette/routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
               ~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/usr/lib64/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/rag_framework", line 411, in lifespan
    await wait_for_llama_server(args.model_host, args.model_port, total_timeout=120)
  File "/usr/bin/rag_framework", line 144, in wait_for_llama_server
    raise TimeoutError(f"LLaMA server at {host}:{port} did not become ready after {total_timeout} seconds.")
TimeoutError: LLaMA server at host.containers.internal:8109 did not become ready after 120 seconds.

ERROR:    Application startup failed. Exiting.
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...
Error connecting to host.containers.internal:8109: , retrying...

federicofortini avatar Dec 14 '25 13:12 federicofortini

Hmm this is interesting try it without the localhost/ and add the tag like rag2:latest!

Also see if regular ramalama serve is working

bmahabirbu avatar Dec 14 '25 23:12 bmahabirbu

Are u using any gpu?

bmahabirbu avatar Dec 14 '25 23:12 bmahabirbu

@bmahabirbu if it says "MLX" it is likely on a Mac device

TomLucidor avatar Dec 19 '25 05:12 TomLucidor

@bmahabirbu sorry for the delay. Yes, regular ramalama serve work flawlessly. And no, I'm not using any gpu. I work entirely with CPU. My server is standard x86 Intel N100.

I tried also without localhost, and with tag. However the results is the same. I don't think it's a gpu / CPU problem, but a network one. The first container spawning is the one that should be searched from the second one, but it's name is not correct. Maybe we can work with the name parameter of podman? I'll do some test when I'll have more spare time.

Federico

federicofortini avatar Dec 19 '25 17:12 federicofortini

host.containers.internal is a special hostname used by podman to allow access to the host, or other rootless containers running on the same machine, it's described in the podman-run manpage.

I'm surprised by the ports it's choosing for you, when I run it it's consistently choosing ports 8080 and 8081. Do you have a lot of other local services running?

mikebonnet avatar Dec 19 '25 17:12 mikebonnet