Attempt to add the `mllama` support
Motivation
This PR attempts to add the mllama support from the Ollama github into examples of this repository.
All code changes are mainly from the llama patch, operator patch, and mllama implement of the ollama repo.
Goals
- [x] Mllama implementation (similar to
clipinllava) - [ ] Model converter of llama-3.2-vision to mllama
- [ ] Full mllama example and document (such as the example of
llava) - [x]
unpadoperation supporting - [x] Mllama model build and load in llama.cpp
Current Status
There are still some issues for this implementation.
-
Model converter. The example model and projection are not on the huggingface.
Currently I use the
ollamaapplication to fetch the converted model for testing. -
The
n_vocab(n_tokensloaded from model) is mismatch with the tensor dimension.The
n_tokensis128257, the dimension ofLLM_TENSOR_OUTPUTfor example is128256. It seems like something wrong in the converted model. -
As mentioned in
2., some assertion will fail when executing the mllama models.ggml_backend_tensor_get_asyncandggml_backend_tensor_getwill fail in the tensor-read-out-of-bound checking.
Thank you for the PR!
There is currently work in progress to introduce a new vision api, and along side this work there has been work on supporting mllama (Llama 3.2 Vision Instruct). Regarding the vocab issue we've had a disussion about this matter which might be of interest.
Thank you for the PR!
There is currently work in progress to introduce a new vision api, and along side this work there has been work on supporting mllama (Llama 3.2 Vision Instruct). Regarding the vocab issue we've had a disussion about this matter which might be of interest.
Thanks for the information! I'll study this to improve.
May I ask if mllama could be compiled in this PR? I didn't see the relevant CMakeLists.
Sorry that we would close this PR due to give up supporting mllama.