Hongji Zhu
Hongji Zhu
Our modified llama.cpp have not been merged into official llama.cpp, please try on this [PR](https://github.com/ggerganov/llama.cpp/pull/6919)
When use int4 model, you should remove the torch_dtype=torch.float16 from AutoModelForCausalLM.from_pretrained(). And it's better to run fast with [vllm](https://github.com/vllm-project/vllm), which support to run MiniCPM already.
Please give the running code and image
Thanks for you feedback,fixed now.
MiniCPM-Llama3-V 2.5 need at least 17GB GPU memory, NVIDIA RTX 3090 24GB is ok. The int4 version need 9GB GPU memory.
Thanks for your attention. Since the training code and data are deeply tied to our internal infrastructure, we do not intend to open source this part. Please refer to our...
感谢反馈,麻烦提供下运行系统,硬件以及具体的prompt,以便我们复现
Thanks for your attention,we will support to deploy the MiniCPM-V 2.5 with vllm soon.
Inference with MiniCPM-Llama3-V 2.5 fp16 need at least 16 GB gpu memory, int4 need 8 GB gpu memory. Full parameter training need 8*A100 80GB, we will release lora fine-tuning code...