How to use Gradio with GGUF model?
I was able to install everything successfully on Windows. But I can't load a GGUF model. I entered "openbmb/MiniCPM-o-2_6-gguf" in the model_server.py, but I get an error that no config.json was found. I'm really only interested in real time voice chat, but I don't think the big standard model without gguf will run on my RTX 3060 with 12gb. Does something have to be changed in the code or how do you get GGUF models to work with the gradio that was ordered? The videos also show that it even runs on an iPad and that certainly doesn't use the large model, right? Thanks in advance for any help
You can try this int4 version, and you only need to replace the model initialization to AutoGPTQForCausalLM.from_quantized in the model_server.py,
I still don't understand which code i need to change in the model_server.py....is there a tutorial for dummys? and i also keep getting a error for flash attention which i did install after finally finding a version which work on my computer
same issue,solved?
https://github.com/OpenBMB/MiniCPM-o/blob/main/web_demos/minicpm-o_2.6/model_server.py#L96 try to change this line~