YuzaChongyi

Results 52 comments of YuzaChongyi

If you want to using int4 version for webdemo, you should change the [model initialization](https://github.com/OpenBMB/MiniCPM-o/blob/main/web_demos/minicpm-o_2.6/model_server.py#L96) by `AutoGPTQForCausalLM.from_quantized` ```python model = AutoGPTQForCausalLM.from_quantized( 'openbmb/MiniCPM-o-2_6-int4', torch_dtype=torch.bfloat16, device="cuda:0", trust_remote_code=True, disable_exllama=True, disable_exllamav2=True ) ```

请文 python 推理的结果是正常的吗? ollama 有可能会有一些精度问题

Perhaps you could attempt to preserve the kv-cache to avoid the re-prefill computation.

If it is a newly added prompt input, additional inference time will be required, but it can save the prefill time overhead of the few-shot prefix prompt.

You can try this [int4 version](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4), and you only need to replace the model initialization to `AutoGPTQForCausalLM.from_quantized` in the model_server.py,

For minicpm-v 2.6, we have not conducted large-scale training for OCR capabilities in languages other than Chinese and English. Therefore, there is still significant room for improvement in OCR performance...

你好,在 2.6v 的 video-qa 模式下,如果视频足够长的话,或者视频时长秒数大于 MAX_NUM_FRAMES,视频会进行均匀抽帧。

你好,可以根据这个 [README](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4#usage-of-minicpm-o-2_6-int4) 步骤安装 AutoGPTQ 使用 int4 量化推理 You can follow the steps in this [README](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4#usage-of-minicpm-o-2_6-int4) to install AutoGPTQ and perform int4 quantized inference.

Thanks for using,can you share your test images and test code?

目前模型推理内置了一些 cuda 操作,你可以先参考这个 [pr](https://huggingface.co/openbmb/MiniCPM-o-2_6/discussions/19) 修改一下代码试试