Cannot run converted models (imported using convert command)
It seems to be not possible to run model which were imported using convert command:
$ ramalama run --ngl 0 oci://localhost/pllum8b-instr-q4km:latest
Loading modelgguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/models/model.file
llama_model_load_from_file_impl: failed to load model
initialize_model: error: unable to load model from file: /mnt/models/model.file
The same error is reported independly whether raw or car option was used:
$ ramalama convert --type raw file:///home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd pllum8b-instr-q4km
Converting /home/dw/.local/share/ramalama/models/file/home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd to pllum8b-instr-q4km...
Building pllum8b-instr-q4km...
$ ramalama ls | grep pllum8b-instr-q4km
oci://localhost/pllum8b-instr-q4km:latest 19 seconds ago 790 B
$ $ ramalama --debug run --ngl 0 oci://localhost/pllum8b-instr-q4km:latest
run_cmd: podman image inspect localhost/pllum8b-instr-q4km:latest
Working directory: None
Ignore stderr: False
Ignore all: False
Command finished with return code: 0
run_cmd: podman inspect quay.io/ramalama/rocm:0.6
Working directory: None
Ignore stderr: False
Ignore all: True
Command finished with return code: 0
exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_kOB7kc5lx4 --env=HOME=/tmp --init --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=oci://localhost/pllum8b-instr-q4km:latest --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.command=run --env LLAMA_PROMPT_PREFIX=🦭 > --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --network none --mount=type=image,src=localhost/pllum8b-instr-q4km:latest,destination=/mnt/models,subpath=/models quay.io/ramalama/rocm:0.6 llama-run -c 2048 --temp 0.8 -v --ngl 0 /mnt/models/model.file
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Ryzen Embedded R1505G with Radeon Vega Gfx, gfx902:xnack+ (0x902), VMM: no, Wave Size: 64
llama_model_load_from_file_impl: using device ROCm0 (AMD Ryzen Embedded R1505G with Radeon Vega Gfx) - 31055 MiB free
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/models/model.file
llama_model_load_from_file_impl: failed to load model
initialize_model: error: unable to load model from file: /mnt/models/model.file
$ ramalama convert --type car file:///home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd pllum8b-instr-q4km
Converting /home/dw/.local/share/ramalama/models/file/home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd to pllum8b-instr-q4km...
Building pllum8b-instr-q4km...
$ ramalama ls | grep pllum8b-instr-q4km
oci://localhost/pllum8b-instr-q4km:latest 4 seconds ago 791 B
$ ramalama --debug run --ngl 0 oci://localhost/pllum8b-instr-q4km:latest
run_cmd: podman image inspect localhost/pllum8b-instr-q4km:latest
Working directory: None
Ignore stderr: False
Ignore all: False
Command finished with return code: 0
run_cmd: podman inspect quay.io/ramalama/rocm:0.6
Working directory: None
Ignore stderr: False
Ignore all: True
Command finished with return code: 0
exec_cmd: podman run --rm -i --label ai.ramalama --name ramalama_AoHxUZatGd --env=HOME=/tmp --init --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --label ai.ramalama.model=oci://localhost/pllum8b-instr-q4km:latest --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.command=run --env LLAMA_PROMPT_PREFIX=🦭 > --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --network none --mount=type=image,src=localhost/pllum8b-instr-q4km:latest,destination=/mnt/models,subpath=/models quay.io/ramalama/rocm:0.6 llama-run -c 2048 --temp 0.8 -v --ngl 0 /mnt/models/model.file
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Ryzen Embedded R1505G with Radeon Vega Gfx, gfx902:xnack+ (0x902), VMM: no, Wave Size: 64
llama_model_load_from_file_impl: using device ROCm0 (AMD Ryzen Embedded R1505G with Radeon Vega Gfx) - 31055 MiB free
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/models/model.file
llama_model_load_from_file_impl: failed to load model
initialize_model: error: unable to load model from file: /mnt/models/model.file
$ rpm -qv python3-ramalama
python3-ramalama-0.6.2-1.fc40.noarch
Input file to convert command was obtained as described in https://github.com/containers/ramalama/issues/904.
Can you run the ollama model successfully?
Yep. The model is working fine in ollama:
$ ollama run pllum8b-instr-q4km
>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM pllum8b-instr-q4km:latest
FROM /home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd
TEMPLATE {{ .Prompt }}
>>> Send a message (/? for help)
$ ll /home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd
-rw-r--r-- 1 dw dw 4920746656 Mar 3 10:53 /home/dw/.ollama/models/blobs/sha256-19314ef0159c739868860d0ee15851e7b19a0433a94c1e0afa727a8a013bd0fd
ollama run pllum8b-instr-q4km
pulling manifest
Error: pull model manifest: file does not exist
I converted this model from https://huggingface.co/CYFRAGOVPL/Llama-PLLuM-8B-instruct. Here are the steps (based on my notes):
$ git clone https://huggingface.co/CYFRAGOVPL/Llama-PLLuM-8B-instruct
$ git clone https://github.com/ggerganov/llama.cpp.git
# make sure you've dependencies listed in: requirements/requirements-convert_hf_to_gguf.txt
$ ./llama.cpp/convert_hf_to_gguf.py ./Llama-PLLuM-8B-instruct --outfile Llama-PLLuM-8B-instruct.gguf
$ echo "FROM ./Llama-PLLuM-8B-instruct.gguf >Modelfile
$ ollama create pllum8b-instr-q4km -q Q4_K_M -f Modelfile
Are you still seeing this issue?
Yep, tested on:
$ ramalama version ramalama version 0.7.2
@dwrobel Still seeing this issue?
Refreshed steps for reproducing on F41:
# Steps for F41:
sudo dnf install -y python3.10 git-lfs ramalama
git clone https://huggingface.co/CYFRAGOVPL/Llama-PLLuM-8B-instruct
(cd Llama-PLLuM-8B-instruct && git lfs pull)
git clone https://github.com/ggerganov/llama.cpp.git
virtualenv --python=/usr/bin/python3.10 venv/
source venv/bin/activate
python -m pip install -r ./llama.cpp/requirements/requirements-convert_hf_to_gguf.txt
./llama.cpp/convert_hf_to_gguf.py ./Llama-PLLuM-8B-instruct --outfile Llama-PLLuM-8B-instruct.gguf
echo "FROM ./Llama-PLLuM-8B-instruct.gguf" >Modelfile
ollama create pllum8b-instr-q4km -q Q4_K_M -f Modelfile
ollama run pllum8b-instr-q4km
>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM pllum8b-instr-q4km:latest
FROM /home/dw/.ollama/models/blobs/sha256-a7cacd6a6bb98fed3bc310a398ed34aa20c8d82408615ce099bbe55abd51e49f
TEMPLATE {{ .Prompt }}
>>>
ramalama convert --type raw file:/home/dw/.ollama/models/blobs/sha256-a7cacd6a6bb98fed3bc310a398ed34aa20c8d82408615ce099bbe55abd51e49f9f pllum8b-instr-q4km
An attempt to run the model on ramalama:
$ ramalama version
ramalama version 0.11.0
$ ramalama ls | grep pllum8b-instr-q4km
oci://localhost/pllum8b-instr-q4km:latest 15 minutes ago 719 B
$ ramalama --debug run --ngl 0 oci://localhost/pllum8b-instr-q4km:latest
2025-07-24 11:31:46 - DEBUG - run_cmd: nvidia-smi
2025-07-24 11:31:46 - DEBUG - Working directory: None
2025-07-24 11:31:46 - DEBUG - Ignore stderr: False
2025-07-24 11:31:46 - DEBUG - Ignore all: False
2025-07-24 11:31:46 - DEBUG - Command finished with return code: 0
2025-07-24 11:31:46 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.11
2025-07-24 11:31:46 - DEBUG - Working directory: None
2025-07-24 11:31:46 - DEBUG - Ignore stderr: False
2025-07-24 11:31:46 - DEBUG - Ignore all: True
2025-07-24 11:31:46 - DEBUG - Checking if 8080 is available
2025-07-24 11:31:46 - DEBUG - run_cmd: podman image inspect localhost/pllum8b-instr-q4km:latest
2025-07-24 11:31:46 - DEBUG - Working directory: None
2025-07-24 11:31:46 - DEBUG - Ignore stderr: False
2025-07-24 11:31:46 - DEBUG - Ignore all: False
2025-07-24 11:31:46 - DEBUG - Command finished with return code: 0
2025-07-24 11:31:46 - DEBUG - Checking if 8080 is available
Traceback (most recent call last):
File "/usr/bin/ramalama", line 8, in <module>
sys.exit(main())
~~~~^^
File "/usr/lib/python3.13/site-packages/ramalama/cli.py", line 1243, in main
args.func(args)
~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/cli.py", line 985, in run_cli
model.serve(args, quiet=True) if args.rag else model.run(args)
~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 358, in run
self._start_server(args)
~~~~~~~~~~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 369, in _start_server
self.serve(args, True)
~~~~~~~~~~^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 739, in serve
exec_args = self.build_exec_args_serve(args)
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 644, in build_exec_args_serve
exec_args = self.llama_serve(args)
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 587, in llama_serve
self._get_entry_model_path(args.container, args.generate, args.dryrun),
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 189, in _get_entry_model_path
raise NoRefFileFound(self.model)
ramalama.model.NoRefFileFound: No ref file or models found for 'localhost/pllum8b-instr-q4km:latest'. Please pull model.
Error: Failed to serve model pllum8b-instr-q4km, for ramalama run command
The size of the image (checking with podman) is suspiciously small:
$ podman images | grep pllum8b-instr-q4km
localhost/pllum8b-instr-q4km latest 1de9bf8d3149 19 minutes ago 719 B
despite that conversion by ramalama took long time (>30mins).
Thanks, hope to take a look at this one.
I experienced the same issue. Steps to reproduce:
- Convert and push Ollama model to OCI artefact. It completes without errors
ramalama push ollama://smollm2:135m oci://ghcr.io/thomasvitale/ramalama/smollm2:135m
I can see the image has been created.
% podman images | grep ghcr.io/thomasvitale/ramalama/smollm2
ghcr.io/thomasvitale/ramalama/smollm2 135m bc9a2cf95f97 5 minutes ago 271 MB
- Run the newly containerised model.
ramalama run oci://ghcr.io/thomasvitale/ramalama/smollm2:135m
It fails with the following error:
Traceback (most recent call last):
File "/opt/homebrew/bin/ramalama", line 8, in <module>
sys.exit(main())
~~~~^^
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/cli.py", line 1248, in main
args.func(args)
~~~~~~~~~^^^^^^
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/cli.py", line 986, in run_cli
model.serve(args, quiet=True) if args.rag else model.run(args)
~~~~~~~~~^^^^^^
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/model.py", line 361, in run
self._start_server(args)
~~~~~~~~~~~~~~~~~~^^^^^^
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/model.py", line 372, in _start_server
self.serve(args, True)
~~~~~~~~~~^^^^^^^^^^^^
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/model.py", line 746, in serve
exec_args = self.build_exec_args_serve(args)
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/model.py", line 646, in build_exec_args_serve
exec_args = self.llama_serve(args)
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/model.py", line 590, in llama_serve
self._get_entry_model_path(args.container, args.generate, args.dryrun),
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/ramalama/0.11.2/libexec/lib/python3.13/site-packages/ramalama/model.py", line 189, in _get_entry_model_path
raise NoRefFileFound(self.model)
ramalama.model.NoRefFileFound: No ref file or models found for 'ghcr.io/thomasvitale/ramalama/smollm2:135m'. Please pull model.
I'm on macOS 15.5 and using Podman Desktop.
@engelmi The model_store has broken this support. Basically we need to mount the image not the path when using OCI images.
Yes, this broke while refactoring the parts to mount all model files into the container - since converted oci models don't have a reffile, it fails (previously it was reusing models from the local store instead of the image).
Thanks for fixing it in #1802! @rhatdan (didn't know about the mount type=image of podman)
#1802 Should have fixed this issue, so I am closing it. @dwrobel Please reopen if the issue persists.
It looks like the fix doesn't work (at least for me):
$ ramalama version
ramalama version 0.11.2
$ time ramalama convert --type raw file:///home/dw/.ollama/models/blobs/sha256-a7cacd6a6bb98fed3bc310a398ed34aa20c8d82408615ce099bbe55abd51e49f pllum8b-instr-q4km
Converting /home/dw/.local/share/ramalama/store to /home/dw/.local/share/ramalama/store ...
Building pllum8b-instr-q4km ...
d124364e8ddc2185c2b525229bf475d3fef9916ef3d79fddce3d6a6496ff6c21
real 9m22.792s
user 4m45.328s
sys 2m32.362s
$ ramalama ls | grep pllum8b-instr-q4km
oci://localhost/pllum8b-instr-q4km:latest 9 minutes ago 719 B
$ ramalama --debug run --ngl 0 oci://localhost/pllum8b-instr-q4km:latest
2025-08-12 08:40:39 - DEBUG - run_cmd: nvidia-smi
2025-08-12 08:40:39 - DEBUG - Working directory: None
2025-08-12 08:40:39 - DEBUG - Ignore stderr: False
2025-08-12 08:40:39 - DEBUG - Ignore all: False
2025-08-12 08:40:39 - DEBUG - Command finished with return code: 0
2025-08-12 08:40:39 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.11
2025-08-12 08:40:39 - DEBUG - Working directory: None
2025-08-12 08:40:39 - DEBUG - Ignore stderr: False
2025-08-12 08:40:39 - DEBUG - Ignore all: True
2025-08-12 08:40:39 - DEBUG - Checking if 8080 is available
2025-08-12 08:40:39 - DEBUG - run_cmd: podman image inspect localhost/pllum8b-instr-q4km:latest
2025-08-12 08:40:39 - DEBUG - Working directory: None
2025-08-12 08:40:39 - DEBUG - Ignore stderr: False
2025-08-12 08:40:39 - DEBUG - Ignore all: False
2025-08-12 08:40:39 - DEBUG - Command finished with return code: 0
2025-08-12 08:40:39 - DEBUG - Checking if 8080 is available
Traceback (most recent call last):
File "/usr/bin/ramalama", line 8, in <module>
sys.exit(main())
~~~~^^
File "/usr/lib/python3.13/site-packages/ramalama/cli.py", line 1248, in main
args.func(args)
~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/cli.py", line 986, in run_cli
model.serve(args, quiet=True) if args.rag else model.run(args)
~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 361, in run
self._start_server(args)
~~~~~~~~~~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 372, in _start_server
self.serve(args, True)
~~~~~~~~~~^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 746, in serve
exec_args = self.build_exec_args_serve(args)
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 646, in build_exec_args_serve
exec_args = self.llama_serve(args)
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 590, in llama_serve
self._get_entry_model_path(args.container, args.generate, args.dryrun),
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/ramalama/model.py", line 189, in _get_entry_model_path
raise NoRefFileFound(self.model)
ramalama.model.NoRefFileFound: No ref file or models found for 'localhost/pllum8b-instr-q4km:latest'. Please pull model.
Error: Failed to serve model pllum8b-instr-q4km, for ramalama run command
Output of: podman image inspect localhost/pllum8b-instr-q4km:latest
$ podman image inspect localhost/pllum8b-instr-q4km:latest
[
{
"Id": "cb27b56d2a6b968424f2bbc2e0e13950a7642f3d92a9fab9385ab73a7317426c",
"Digest": "sha256:28579d4968e238fa1141a222c966656255dac4e46ce38356bf5a14a767663699",
"RepoTags": [],
"RepoDigests": [],
"Parent": "",
"Comment": "",
"Created": "2025-08-12T06:30:12.016550768Z",
"Config": {
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"WorkingDir": "/",
"Labels": {
"io.buildah.version": "1.40.1",
"org.containers.type": "ai.image.model.raw"
}
},
"Version": "",
"Author": "",
"Architecture": "amd64",
"Os": "linux",
"Size": 4920752110,
"VirtualSize": 4920752110,
"GraphDriver": {
"Name": "overlay",
"Data": {
"UpperDir": "/home/dw/.local/share/containers/storage/overlay/25df07b170c64df99f542b9009b1c03684c0bfda4263f20fb01dedda5bee0ef9/diff",
"WorkDir": "/home/dw/.local/share/containers/storage/overlay/25df07b170c64df99f542b9009b1c03684c0bfda4263f20fb01dedda5bee0ef9/work"
}
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:25df07b170c64df99f542b9009b1c03684c0bfda4263f20fb01dedda5bee0ef9"
]
},
"Labels": {
"io.buildah.version": "1.40.1",
"org.containers.type": "ai.image.model.raw"
},
"Annotations": {
"org.opencontainers.image.base.digest": "",
"org.opencontainers.image.base.name": ""
},
"ManifestType": "application/vnd.oci.image.manifest.v1+json",
"User": "",
"History": [
{
"created": "2025-08-12T06:30:03.13544986Z",
"created_by": "/bin/sh -c #(nop) COPY dir:1a0633f8f95583902c852746b9c3fee3afeceb0fd839a3cab6ce0852db0688b9 in /models ",
"empty_layer": true
},
{
"created": "2025-08-12T06:30:11.988469005Z",
"created_by": "/bin/sh -c #(nop) COPY file:953d8e21e281c4642a8777cf5e6d1aab4a343615ea27e4562caecc6f074b1c21 in /models/sha256-a7cacd6a6bb98fed3bc310a398ed34aa20c8d82408615ce099bbe55abd51e49f/sha256-a7cacd6a6bb98fed3bc310a398ed34aa20c8d82408615ce099bbe55abd51e49f ",
"empty_layer": true
},
{
"created": "2025-08-12T06:30:19.51049439Z",
"created_by": "/bin/sh -c #(nop) LABEL org.containers.type=ai.image.model.raw"
}
],
"NamesHistory": [
"docker.io/library/ec94b94c6de8e98791441e73796b65959118ed7ebcba866d36873af7389ac797-tmp:latest"
]
}
]
@dwrobel Please reopen if the issue persists.
@engelmi BTW, I don't have permission to re-open.
$ ramalama version ramalama version 0.11.2
@dwrobel It seems you are using v0.11.2 of ramalama. The fix, PR #1802, is not yet included in any release. I think the next release is planned for next week. Until then you would need to clone the ramalama repo and use its latest state on main.
@engelmi BTW, I don't have permission to re-open. Oh, I didn't know. I'll reopen the issue. Please ping me here when you verified that the fix does or does not work.
It seems you are using v0.11.2 of ramalama. The fix, PR https://github.com/containers/ramalama/pull/1802, is not yet included in any release.
I simply assumed that if the issue was closed and I'm running the latest version available on koji https://koji.fedoraproject.org/koji/packageinfo?packageID=42316 (for F42) then the fix should be included.
I'll recheck it on >=v0.11.4 once available on koji.
We close issues, when they are closed in the upstream, not in the latest release. @engelmi did you discover if this was not fixed in the main branch?
You are right @rhatdan It is fixed in the main branch. Closing again. @dwrobel Please ping here or create a new issue.