Ovis AIDC-AI/Ovis2-16B and 32B variants quantized version needed

Is it possible to have 4bit quantized variants of those models so that we can load atleast 16B model easily under 8 GB vram. It is performing great, i tested through huggingface space. Kudos to the team for their wonderful work on the project.

I dont know if quantization is easily achievable for multimodals right now or not. As far as i can remember from some months ago, there were lots of issues people had to face when running minicpm v multimodals. Custom patches were implemented in llama.cpp at that time to load the gguf quantized model but those patches were pulled in official version after a significant amount of time (idk why but thats all i can remember). Then people had to use custom ollama with those custom llama.cpp patches to run multimodals. I dont know if things are smooth for multimodals or not but will love to know more about it.

I used Pytorch Q4 variant of minicpm V at that time with xinference project so even if gguf quantization might not be possible right now still pytorch q4 will help in running atleast 16B quantized model with xinference project.

Thankyou very much for this impressive project.

Feb 22 '25 12:02 Greatz08

Have you tried Ovis2-8B? To me, it seems better than 16B. Very similar responses and accuracy, but I prefer 8B because I feel like 16B tends to hallucinate a bit. Sometimes its output is completely off. 8B is a little less detailed in its responses, but more reliable in general. But I guess it depends on the use cases, etc. So give it a try!

Feb 22 '25 15:02 kabal1000x

@kabal1000x Nice, i will try this week :-) . I tried 16B through hugginface spaces and it was pretty good in my opinion. It could think plus can process bigger resolution images well. For the first time ever i saw local model processing video as well so it was new and awesome experience for me personally and i want to try locally (just needed better quantized variant of it so that my machine can handle :-) )

Feb 23 '25 03:02 Greatz08

On another note, could the model be quatized to a gguf format?