YuzaChongyi
YuzaChongyi
Could you provide an example to illustrate what specific type of repetition you are referring to?
Thanks! We will look into the possibility of removing the dependency on flash attention.
这里如果是使用 model.chat 接口的话,需要在传入的 msgs 里包含历史对话信息,streaming 模式在一定上下文长度内会复用 kv-cache,相当于已经记录了上下文
你好,可以根据这个 [README](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4#usage-of-minicpm-o-2_6-int4) 步骤安装 AutoGPTQ 使用 int4 量化推理 You can follow the steps in this [README](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4#usage-of-minicpm-o-2_6-int4) to install AutoGPTQ and perform int4 quantized inference.
The model's download path must not contain any `.`. The `.` character will lead to failures during the dynamic importing process at runtime. For example, you can rename the folder/directory...
你好,可以更新一下 huggingface 仓库的最新代码
Thank you for your usage, torch.compile does have some room for compilation optimization and acceleration. We haven't actually tested it, so you could give it a try. We’d really appreciate...
应该是文件下载的不全,可以尝试重新清除缓存后重新下载模型文件,这部分推理代码在 transformers=4.55.0 经过完整测试的
请问是用的哪一份微调代码进行训练的呢? 目前的信息无法定位,如果稳定在同一个阶段出现比较像是数据问题
Please make sure these dependencies are installed,or you can run `pip install vector_quantize_pytorch vocos`