llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Missing mmproj?

Open CHesketh76 opened this issue 10 months ago • 4 comments

After following the steps in the documentation:

-m
/path/to/my/weights/mistral-7b-Q8_0.gguf
--mmproj
mistral-7b-mmproj-Q8_0.gguf
--host
0.0.0.0
-ngl
9999
...

I get this message: mistral-7b-mmproj-Q8_0.gguf: No such file or directory

Is this normal, what is the mmproj and how do we make it?

CHesketh76 avatar Apr 05 '24 15:04 CHesketh76

Have a read of ~/git/llamafile/llama.cpp/server/README.md . Within this section:

  • --mmproj MMPROJ_FILE: Path to a multimodal projector file for LLaVA.

If I recall, this is for stuff like processing/generating images . Similar to how ChatGPT4 is able to handle images.

mofosyne avatar Apr 05 '24 16:04 mofosyne

Oh so this I can just remove that if I am not using a multi-model? Also, could I combine the llava with my mistral model to create a multi-model?

CHesketh76 avatar Apr 05 '24 17:04 CHesketh76

that bit... I'm unsure. Hope others can chime in on that

mofosyne avatar Apr 05 '24 17:04 mofosyne

@CHesketh76 The mmproj file is optional, you only need it if you plan to process multi-modal inputs (text+images).

I think technically it might be possible to combine mistral model + llava mmproj as long as the embedding dimensions match but I think the output will be garbage. I'm not a computer vision expert but AFAIK the mmproj layer basically projects "image tokens" and "text tokens" into the same latent space. If the mmproj layer and the model aren't trained together, then the projection will be useless.

"CLIP uses a ViT like transformer to get visual features and a causal language model to get the text features. Both the text and visual features are then projected to a latent space with identical dimension. The dot product between the projected image and text features is then used as a similar score." source https://huggingface.co/docs/transformers/en/model_doc/clip

k8si avatar Apr 05 '24 20:04 k8si

You should only use the mmproj flag if you're using a vision model like LLaVA 1.5. https://huggingface.co/jartine/llava-v1.5-7B-GGUF There's efforts afoot to make Mistral 7b multi-model but it's still a work in progress. If you're using a model like normal Mistral which doesn't ship with an mmproj file, then all you need to do is not pass the --mmproj flag.

jart avatar Apr 06 '24 05:04 jart