MiniCPM
MiniCPM copied to clipboard
MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips
我在使用llamafactory微调cpm3时发生了这个错误 [rank0]: Traceback (most recent call last): [rank0]: File "/data2/liushuliang/MiniCPM/LLaMA-Factory/src/train.py", line 28, in [rank0]: main() [rank0]: File "/data2/liushuliang/MiniCPM/LLaMA-Factory/src/train.py", line 19, in main [rank0]: run_exp() [rank0]: File "/data2/liushuliang/MiniCPM/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in...
### Feature request / 功能建议 看现在支持transformers的 function call
### Description / 描述 仓库中提供的V1.0测试数据和脚本的链接失效了“https://cloud.tsinghua.edu.cn/f/71b5232264ae4833a4d0/?dl=1”,请问这份数据还会更新吗?感谢 ### Case Explaination / 案例解释 _No response_
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
### Description / 描述 FlashAttention only supports Ampere GPUs or newer. ### Case Explaination / 案例解释 Due to Flashattention, inference cannot be performed on v100
### Feature request / 功能建议 MiniCPM技术报告中提到“在预训练阶段只使用通用、量大的预训练粗质量数据,而在退火阶段,使用非常广泛的高质量知识和能力数据以及SFT的高质量数据,混合入预训练数据进行退火。” 请问在MiniCPM3中是否采用了相同的训练方法?是否有尝试过在Stable阶段加入高质量数据(或者换成Cosine降低学习率)?
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
在convert_hf_to_gguf.py文件中,转换MiniCPM模型的时候,如下类override了modify_tensors,并且只转换了q_proj.weight和k_proj.weight,请问为什么需要转换呢?或者如注释所说“HF models permute some of the tensors, so we need to undo that”,HF model是在那里做了这部分的permute呢?有点没搞清楚事情的原委。。求解答 ``` @Model.register("MiniCPMForCausalLM") class MiniCPMModel(Model): model_arch = gguf.MODEL_ARCH.MINICPM def set_gguf_parameters(self): block_count = self.hparams["num_hidden_layers"] self.gguf_writer.add_context_length(self.hparams["max_position_embeddings"]) self.gguf_writer.add_embedding_length(self.hparams["hidden_size"]) self.gguf_writer.add_block_count(block_count) self.gguf_writer.add_feed_forward_length(self.hparams["intermediate_size"])...
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
### Feature request / 功能建议 您好,很感谢您的工作,我对这个模型非常感兴趣 我关注到之前2B模型曾开源过中间检查点 请问MiniCPM3-4B模型有计划开源相关权重么?