Yi icon indicating copy to clipboard operation
Yi copied to clipboard

Features Vision Language model in Gguf format for Llama.cpp?

Open chigkim opened this issue 1 year ago • 3 comments

Reminder

  • [X] I have searched the Github Discussion and issues and have not found anything similar to this.

Motivation

Llava, Bakllava, and other vision language models have quantized models in gguf format that can run with llama.cpp.

It looks like there's great interest.

https://www.reddit.com/r/LocalLLaMA/comments/19d73zr/new_yi_vision_model_released_6b_and_34b_available/

Update: It looks like someone is working on it, but having problem with extreme hallucination.

https://github.com/ggerganov/llama.cpp/pull/5093

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

chigkim avatar Jan 23 '24 11:01 chigkim

It looks like someone is already working on it, but having problem with extreme hallucination.

https://github.com/ggerganov/llama.cpp/pull/5093

chigkim avatar Jan 23 '24 16:01 chigkim

The hallucination is a problem of the model, the PR is fine you can use it already

cmp-nct avatar Jan 23 '24 21:01 cmp-nct

I've provided a couple GGUF models (quantized vision stack and llm) on my HF profile for both models: https://huggingface.co/cmp-nct

Overall I am not convinced on the performance of Yi-VL, it showed some good outputs but it also showed some severe problems. Hallucination is the most striking problem but I've also seen it ignore the finetune and instead of writing the "stopword ###" it continues and writes another Human: question and answers it.

Also generally: why is there the need for a Stopword ? Why does the finetune not have a regular stop token that's configured ?

cmp-nct avatar Jan 26 '24 19:01 cmp-nct