Yi
Yi copied to clipboard
Features Vision Language model in Gguf format for Llama.cpp?
Reminder
- [X] I have searched the Github Discussion and issues and have not found anything similar to this.
Motivation
Llava, Bakllava, and other vision language models have quantized models in gguf format that can run with llama.cpp.
It looks like there's great interest.
https://www.reddit.com/r/LocalLLaMA/comments/19d73zr/new_yi_vision_model_released_6b_and_34b_available/
Update: It looks like someone is working on it, but having problem with extreme hallucination.
https://github.com/ggerganov/llama.cpp/pull/5093
Are you willing to submit a PR?
- [ ] I'm willing to submit a PR!
It looks like someone is already working on it, but having problem with extreme hallucination.
https://github.com/ggerganov/llama.cpp/pull/5093
The hallucination is a problem of the model, the PR is fine you can use it already
I've provided a couple GGUF models (quantized vision stack and llm) on my HF profile for both models: https://huggingface.co/cmp-nct
Overall I am not convinced on the performance of Yi-VL, it showed some good outputs but it also showed some severe problems. Hallucination is the most striking problem but I've also seen it ignore the finetune and instead of writing the "stopword ###" it continues and writes another Human: question and answers it.
Also generally: why is there the need for a Stopword ? Why does the finetune not have a regular stop token that's configured ?