Yi Features Vision Language model in Gguf format for Llama.cpp?

Features Vision Language model in Gguf format for Llama.cpp?

Open chigkim opened this issue 1 year ago • 3 comments

Reminder

[X] I have searched the Github Discussion and issues and have not found anything similar to this.

Motivation

Llava, Bakllava, and other vision language models have quantized models in gguf format that can run with llama.cpp.

It looks like there's great interest.

https://www.reddit.com/r/LocalLLaMA/comments/19d73zr/new_yi_vision_model_released_6b_and_34b_available/

Update: It looks like someone is working on it, but having problem with extreme hallucination.

https://github.com/ggerganov/llama.cpp/pull/5093

Are you willing to submit a PR?

[ ] I'm willing to submit a PR!

Jan 23 '24 11:01 chigkim

It looks like someone is already working on it, but having problem with extreme hallucination.

https://github.com/ggerganov/llama.cpp/pull/5093

Jan 23 '24 16:01 chigkim

The hallucination is a problem of the model, the PR is fine you can use it already

Jan 23 '24 21:01 cmp-nct

I've provided a couple GGUF models (quantized vision stack and llm) on my HF profile for both models: https://huggingface.co/cmp-nct

Overall I am not convinced on the performance of Yi-VL, it showed some good outputs but it also showed some severe problems. Hallucination is the most striking problem but I've also seen it ignore the finetune and instead of writing the "stopword ###" it continues and writes another Human: question and answers it.

Also generally: why is there the need for a Stopword ? Why does the finetune not have a regular stop token that's configured ?

Jan 26 '24 19:01 cmp-nct

Yi Yi copied to clipboard

Features Vision Language model in Gguf format for Llama.cpp?

Reminder

Motivation

Are you willing to submit a PR?

Yi
Yi copied to clipboard