ollama icon indicating copy to clipboard operation
ollama copied to clipboard

Unsupported Architecture for Vision Model Conversion to GGUF in Ollama

Open NullEqualsZero opened this issue 1 year ago • 11 comments

Model: Llama-3.2-11B-Vision-Instruct-abliterated Error Output:

transferring model data 100%  
converting model  
Error: unsupported architecture

Description:

I'm trying to use the model huihui-ai/Llama-3.2-11B-Vision-Instruct-abliterated from HuggingFace in Ollama, but I'm encountering an "unsupported architecture" error when attempting to import the model.

Based on what I understand, the issue seems to be related to vision model support in Ollama. Currently, Llama Vision is supported, but it seems that importing this model fails due to the architecture (Mllama), which isn't recognized during the conversion process to GGUF.

Additionally, trying to manually convert the model to GGUF using llama.cpp doesn't work either, as the conversion script does not support the Mllama architecture required by this vision model.

Steps to Reproduce:

  1. Attempt to import the model into Ollama.
  2. The transfer process completes but fails during conversion, with the error: "unsupported architecture."

Expected Behavior:

Support for importing and converting vision models like huihui-ai/Llama-3.2-11B-Vision-Instruct-abliterated into GGUF format, or at least a workaround for vision architectures beyond standard Llama Vision.

Additional Notes:

  • The model is in safetensor format from Hugging Face.
  • It appears Ollama tries to convert it back to GGUF, but this fails due to unsupported architecture.
  • Request: Please add support for vision models to handle Mllama architecture.

Any guidance or updates on expanding Ollama’s vision model support would be appreciated.

NullEqualsZero avatar Dec 02 '24 17:12 NullEqualsZero

I encountered the same issue with vision models not being converted correctly to GGUF. I tested with the following model:

https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored https://huggingface.co/sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit https://huggingface.co/cognitivecomputations/dolphin-vision-72b

and as you said, trying to manually convert the model to GGUF using llama.cpp python script, convert_hf_to_gguf.py can not convert them. so there seems no way to use these vision model for ollama.

tokitoki22 avatar Dec 08 '24 05:12 tokitoki22

I encountered the same issue with vision models not being converted correctly to GGUF. I tested with the following model:

https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored https://huggingface.co/sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit https://huggingface.co/cognitivecomputations/dolphin-vision-72b

and as you said, trying to manually convert the model to GGUF using llama.cpp python script, convert_hf_to_gguf.py can not convert them. so there seems no way to use these vision model for ollama.

yeah saddly it seems that way

NullEqualsZero avatar Dec 09 '24 20:12 NullEqualsZero

Thank you for reporting, we are working on updating this for ollama create. Sorry for the wait!

mchiang0610 avatar Dec 11 '24 16:12 mchiang0610

Absolutely no problem. However, I have found this behavior for all safetensor imports with a vision aspect like Phi Vision or Vision-8B-MiniCPM. So if you can look into that as well that would be amazing

NullEqualsZero avatar Dec 17 '24 20:12 NullEqualsZero

Also, I am more than happy to do testing and stuff. Also, I am some sort of a programmer so if you can direct me to what I should try to modify or how to help that would be awesome

NullEqualsZero avatar Dec 17 '24 21:12 NullEqualsZero

Thank you for reporting, we are working on updating this for ollama create. Sorry for the wait!

Any update ? I am trying to convert, merge llm gguf and vision gguf.

gaussiangit avatar Dec 22 '24 10:12 gaussiangit

The model "Llama-3.2-11B-Vision-Instruct-abliterated" has been converted to GGUF format. The link is shared on Hugging Face. After downloading, please create a Modelfile and add it. https://huggingface.co/case01/Llama-3.2-11B-Vision-Instruct-abliterated-gguf/tree/main

For example:

FROM llama-3.2-11B-vision_Q8_0.gguf FROM llama-3.2-11B-vision_f16_projector.gguf

TEMPLATE """{{- range $index, $_ := .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>

{{ .Content }} {{- if gt (len (slice $.Messages $index)) 1 }}<|eot_id|> {{- else if ne .Role "assistant" }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ end }} {{- end }}"""

sybaek-infinyx avatar Jan 17 '25 14:01 sybaek-infinyx

@sybaek-infinyx Can you share how you converted the model to gguf?

Namzakku avatar Jan 23 '25 15:01 Namzakku

@Namzakku I have referred to the following link for mllama conversion: https://github.com/danbev/llama.cpp/tree/vision-api-mllama-example/examples/simple-vision-mllama.

You need to go to the file located at 'llama.cpp/gguf-py/gguf/constants.py' and change the conversion names to match ollama: https://ollama.com/library/llama3.2-vision/blobs/ece5e659647a.

However, it's important to note that 'ffn_down' and 'ffn_up' are reversed. This could be due to my lack of understanding of the structure, but if you proceed carefully, it should work fine.

I apologize for not being able to share my code as it is too messy, including hard coding.

sybaek-infinyx avatar Jan 24 '25 00:01 sybaek-infinyx

The model "Llama-3.2-11B-Vision-Instruct-abliterated" has been converted to GGUF format. The link is shared on Hugging Face. After downloading, please create a Modelfile and add it. https://huggingface.co/case01/Llama-3.2-11B-Vision-Instruct-abliterated-gguf/tree/main

For example:

FROM llama-3.2-11B-vision_Q8_0.gguf FROM llama-3.2-11B-vision_f16_projector.gguf

TEMPLATE """{{- range $index, $_ := .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>

{{ .Content }} {{- if gt (len (slice $.Messages $index)) 1 }}<|eot_id|> {{- else if ne .Role "assistant" }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ end }} {{- end }}"""

The model fails to work when an image is attached, the base model's quants: llama3.2-vision model works, but your quants dont. i pulled everything with git lfs. no errors there. your quants work when no image is attached, but when attached, it fails.

❯ ollama run llama3.2-vision-abliterated:11b-instruct-q8_0 --verbose "What is this image of? describe it. '/home/user/Pictures/test/test.jpg'"
Added image '/home/user/Pictures/test/test.jpg'
Error: POST predict: Post "http://127.0.0.1:42439/completion": EOF

❯ ollama show --modelfile llama3.2-vision-abliterated:11b-instruct-q8_0
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM llama3.2-vision-abliterated:11b-instruct-q8_0

FROM /var/lib/ollama/blobs/sha256-8fe88f0c4b761f63834d8ae299980a010ce22b8343c7a89082f6f183540f7ba2
FROM /var/lib/ollama/blobs/sha256-3477e3169139f8b3288e862c1f7e6ccdbc2c692afe959e4500510168cc436aba
TEMPLATE """{{- range $index, $_ := .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>

{{ .Content }}
{{- if gt (len (slice $.Messages $index)) 1 }}<|eot_id|>
{{- else if ne .Role "assistant" }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}"""
PARAMETER num_ctx 8192
PARAMETER num_predict 2048
PARAMETER temperature 0.6
PARAMETER top_p 0.9

Original Modelfile i used is the same, except these two lines being so:

FROM llama-3.2-11B-vision_Q8_0.gguf
FROM llama-3.2-11B-vision_f16_projector.gguf

These quants are from https://huggingface.co/case01/Llama-3.2-11B-Vision-Instruct-abliterated-gguf/tree/main

My ollama version is: 0.5.12 It has no issue with the llama3.2-vision model's quants from the ollama team.

JamesClarke7283 avatar Feb 27 '25 10:02 JamesClarke7283

@JamesClarke7283

Hello. First of all, please understand that the text may not be smooth as I am using a translator.

Main point:

The model I shared is a project that I simply converted because I wanted to use it myself.

In conclusion, I tested it on version 0.5.12, and confirmed that it still works.

I also confirmed that running "ollama run --verbose """ and attaching photos after a regular run also works.

I am unable to reproduce the error, so I am not sure why it is not working.

I am sorry that I could not be of more help.

sybaek-infinyx avatar Feb 28 '25 00:02 sybaek-infinyx

ed is a project that I simply converted because I wanted to use it myself.

Hi @sybaek-infinyx - could you please post your updated consts file? That would be amazing if you could. Thank you! We don't mind about messy. :)

mikeknapp avatar Mar 11 '25 00:03 mikeknapp

@mikeknapp https://github.com/sybaek-infinyx/gguf_convert/tree/main I've shared the code on my GitHub. However, please note that the code is not very readable. I was initially reluctant to share it in this state, but I've decided to make it public in the hope that those with the necessary skills can utilize it effectively.

sybaek-infinyx avatar Mar 11 '25 03:03 sybaek-infinyx

@mikeknapp https://github.com/sybaek-infinyx/gguf_convert/tree/main I've shared the code on my GitHub. However, please note that the code is not very readable. I was initially reluctant to share it in this state, but I've decided to make it public in the hope that those with the necessary skills can utilize it effectively.

Thank you so much!! This worked perfectly.

I really appreciate it 😃

For others playing along, here is what worked for me...

cd ~/3p
git clone https://github.com/danbev/llama.cpp  # Note: Not the official repo!
cd llama.cpp
git switch vision-api-mllama-example
#cmake -B build
#cmake --build build --config Release
conda create -n llama.cpp python=3.10
conda activate llama.cpp
pip install -r requirements.txt

cd ..
git clone https://github.com/sybaek-infinyx/gguf_convert/tree/main

# Then .. copy files across from gguf_convert to llama.cpp

cd ../llama.cpp
python ./convert_hf_to_gguf.py --verbose ~/src/output/model-v1 --outfile ~/src/output/model-v1/model-v1.gguf --outtype q8_0

If you get an error like so:

KeyError: "could not find any of: ['n_layers', 'num_hidden_layers', 'n_layer', 'num_layers'] in self.text_config"

you need to find a more complete version of the config.json for the model. (You might be training on finetuned version.)

mikeknapp avatar Mar 11 '25 04:03 mikeknapp

Actually, while the code runs, and I get an output ... there is something off about the model? It seems very different to the non-GGUF version. @sybaek-infinyx Have you noticed this?

mikeknapp avatar Mar 12 '25 05:03 mikeknapp

@mikeknapp Please note that this code was created in January and is intended solely for the llama3.2-vision model. I haven't tested it with other models, so I can't provide information about the block_count error. I did, however, change the 'vision_output_dim' to '4096' at the bottom of the config.json file. I haven't tried any more GGUF conversions, and since this was just an experiment, I can't answer questions about specific bugs.

sybaek-infinyx avatar Mar 12 '25 06:03 sybaek-infinyx

Thanks so much for sharing and working on this @sybaek-infinyx ! Kudos! The code appears to generate the gguf model file. What about the projector? How do we produce that for Ollama?

geekbass avatar Mar 17 '25 17:03 geekbass

@geekbass To clarify, my contribution primarily involved modifying a few model names and metadata. The core work was done in this branch: https://github.com/danbev/llama.cpp/tree/vision_2_mllama_support, and I only made minor adjustments.

Regarding the projector, it's simply separated and saved through an if statement. It might look a bit messy, but if you look at line 473 of the convert_hf_to_gguf.py script I shared, you'll see that it's just branching and saving it separately. Although it's named "projector," the process is essentially the same as converting a standard LLM model to GGUF.

sybaek-infinyx avatar Mar 18 '25 07:03 sybaek-infinyx

+1 Can we have a feature for Ollama to directly import finetuned llama-vision in Safetensors?

chigkim avatar Mar 18 '25 10:03 chigkim