llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

FR: Phi-3-vision-128k-instruct implementation

Open mirek190 opened this issue 1 year ago • 21 comments

That model is insane for its size ....

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

mirek190 avatar May 21 '24 20:05 mirek190

Is it natively supported once someone converts it to gguf?

simsi-andy avatar May 22 '24 15:05 simsi-andy

Is it natively supported once someone converts it to gguf?

Someone has to write the code to run such a model into llama.cpp. Then it would be a model you could convert to gguf. Until then, no.

4onen avatar May 25 '24 07:05 4onen

I'm waiting who will do that patiently ...😭

mirek190 avatar May 25 '24 10:05 mirek190

I've tried to convert the phi-3-vision-128k-instruct HF model to the GGUF model. But it looks like the current version llama.cpp does not support the vision model (model.vision_embed_tokens, etc.) in phi-3v. After I add "Phi3VForCausalLM" into the convert-hf-to-gguf.py just copy from "Phi3ForCausalLM", the running result looks like below:

... INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Setting special token type bos to 1 INFO:gguf.vocab:Setting special token type eos to 32000 INFO:gguf.vocab:Setting special token type unk to 0 INFO:gguf.vocab:Setting special token type pad to 32000 INFO:gguf.vocab:Setting add_bos_token to True INFO:gguf.vocab:Setting add_eos_token to False INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{{'<|' + message['role'] + '|>' + ' ' + message['content'] + '<|end|> ' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|> ' -}}{% endif %} INFO:hf-to-gguf:Exporting model to 'converted.bin' INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json' INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors' INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {3072, 32064} ... ... File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 330, in write self.write_tensors() File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 266, in write_tensors for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)): File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 233, in modify_tensors return [(self.map_tensor_name(name), data_torch)] File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 184, in map_tensor_name raise ValueError(f"Can not map tensor {name!r}") ValueError: Can not map tensor 'model.vision_embed_tokens.glb_GN'

The tensors' names like 'model.vision_embed_tokens.glb_GN' are not listed in the "TensorNameMap" of the tensor_mapping.py file. These additional models in the Phi-3v can be found here: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main?show_file_info=model.safetensors.index.json

Is that possible to make llama.cpp support multimodel like llava and Phi-3v?

HaoHoo avatar May 27 '24 12:05 HaoHoo

The model is very good for its size for the OCR task, looking forward to use it in GGUF format

DenisSergeevitch avatar May 27 '24 22:05 DenisSergeevitch

Hi @ggerganov, the Phi-3 vision is similar to llava, combined with Phi-3 and CLIP-ViT-Large-patch14-336 models. Is possible to support converting it from HF to GGUF?

HaoHoo avatar May 30 '24 06:05 HaoHoo

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

anuran-roy avatar May 31 '24 18:05 anuran-roy

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

HaoHoo avatar Jun 01 '24 03:06 HaoHoo

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above: ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

I did exactly that, as mentioned in the messages above in this issue. And got the exact same problem. Any sort of workarounds for this - if we can somehow decouple them or something?

anuran-roy avatar Jun 01 '24 19:06 anuran-roy

You can use examples/llava/llava-surgery-v2.py to separate out clip. I was able to modify it to do so successfully. I'm a bit stuck on the rest... the easiest way to do this imo is to modify the code under LLAVA/ to accept the phi3 base model and this hacked off clip encoder

farris avatar Jun 02 '24 00:06 farris

https://github.com/ggerganov/llama.cpp/pull/7705 👁️

farris avatar Jun 03 '24 01:06 farris

Would it be possible to use a parameter in the GGUF header to tell it that the file contains two sets of tensor data?

I feel like for the typical user they will expect to use a single GGUF file.

BrainSlugs83 avatar Jun 27 '24 15:06 BrainSlugs83

bad bot

Aisuko avatar Aug 04 '24 04:08 Aisuko

sad but true

muzhig avatar Aug 04 '24 09:08 muzhig

New release of Phi-3.5-vision-instruct today: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

(As well as a 16x3.8B MoE and an updated version of the basic Phi-3.5-mini)

coder543 avatar Aug 20 '24 22:08 coder543

+1 for support

stygmate avatar Aug 26 '24 21:08 stygmate

@coder543 And it can be converted to GGUF? and use VISION model?

Milor123 avatar Aug 27 '24 00:08 Milor123

@Milor123 Nope… that’s why this issue exists.

coder543 avatar Aug 27 '24 00:08 coder543

Abetlen already did convert it and tries to create an experimental branch: https://huggingface.co/abetlen/Phi-3.5-vision-instruct-gguf

simsi-andy avatar Aug 27 '24 04:08 simsi-andy

https://github.com/ggerganov/llama.cpp/pull/9209/

daboe01 avatar Aug 31 '24 14:08 daboe01

code to use Phi-3.5-vision-instruct-gguf with image locally on llama cpp python????????????

ayttop avatar Sep 02 '24 17:09 ayttop

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Oct 17 '24 01:10 github-actions[bot]

Is the issue closed, or the bot closed anyway?

Regards,

gcapnias avatar Oct 17 '24 14:10 gcapnias

Hi all, I am able to implement this model in llama.cpp and run it smoothly, anybody wants to try please interact here : https://github.com/ggml-org/llama.cpp/discussions/17311

shadowmmu avatar Nov 30 '25 21:11 shadowmmu