Ovis icon indicating copy to clipboard operation
Ovis copied to clipboard

AIDC-AI/Ovis2-16B and 32B variants quantized version needed

Open Greatz08 opened this issue 10 months ago • 10 comments

Is it possible to have 4bit quantized variants of those models so that we can load atleast 16B model easily under 8 GB vram. It is performing great, i tested through huggingface space. Kudos to the team for their wonderful work on the project.

I dont know if quantization is easily achievable for multimodals right now or not. As far as i can remember from some months ago, there were lots of issues people had to face when running minicpm v multimodals. Custom patches were implemented in llama.cpp at that time to load the gguf quantized model but those patches were pulled in official version after a significant amount of time (idk why but thats all i can remember). Then people had to use custom ollama with those custom llama.cpp patches to run multimodals. I dont know if things are smooth for multimodals or not but will love to know more about it.

I used Pytorch Q4 variant of minicpm V at that time with xinference project so even if gguf quantization might not be possible right now still pytorch q4 will help in running atleast 16B quantized model with xinference project.

Thankyou very much for this impressive project.

Greatz08 avatar Feb 22 '25 12:02 Greatz08

Have you tried Ovis2-8B? To me, it seems better than 16B. Very similar responses and accuracy, but I prefer 8B because I feel like 16B tends to hallucinate a bit. Sometimes its output is completely off. 8B is a little less detailed in its responses, but more reliable in general. But I guess it depends on the use cases, etc. So give it a try!

kabal1000x avatar Feb 22 '25 15:02 kabal1000x

@kabal1000x Nice, i will try this week :-) . I tried 16B through hugginface spaces and it was pretty good in my opinion. It could think plus can process bigger resolution images well. For the first time ever i saw local model processing video as well so it was new and awesome experience for me personally and i want to try locally (just needed better quantized variant of it so that my machine can handle :-) )

Greatz08 avatar Feb 23 '25 03:02 Greatz08

On another note, could the model be quatized to a gguf format?

wsbagnsv1 avatar Feb 23 '25 14:02 wsbagnsv1

I just opened this issue. I've trying to convert Ovis2 to GGUF but without luck: https://github.com/AIDC-AI/Ovis/issues/60#issue-2875964058

Mixomo avatar Feb 24 '25 19:02 Mixomo

+1

FerLuisxd avatar Mar 09 '25 22:03 FerLuisxd

+1!

Mixomo avatar Mar 10 '25 21:03 Mixomo

Thank you for your interest in Ovis. We'll be releasing a GPTQ-quantized version of Ovis2 soon. Please stay tuned for updates!

runninglsy avatar Mar 18 '25 06:03 runninglsy

+1

yizhangliu avatar Mar 23 '25 00:03 yizhangliu

Please do a GGUF quantization! From #60 seems, it's not trivial.

Disonantemus avatar May 01 '25 01:05 Disonantemus

Im not that well into the whole thing, got some errors on the way and didnt pursue further /:

wsbagnsv1 avatar May 02 '25 16:05 wsbagnsv1