suhmily10 comments

Results 6 comments of


                                            suhmily10

Training instability with InfLLM v2 on MiniCPM4-8B using sparse attention

May I ask which training framework are you using? Did you train Minicpm4 with any parallel strategies such as tensor parallelism or sequence parallelism? Also we open sourced our training...

[Bad Case]: 使用minicpm4-0.5B同时启用sparse attention时报错

sparse attention 要求 group size 16, 0.5b 用的话需要复制 q，参考代码 query_layer_repeat = query_layer.repeat_interleave(repeat_times, dim=-2) topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)

[Bad Case]: 使用minicpm4-0.5B同时启用sparse attention时报错

可以在MiniCPMInfLLMv2Attention 的 sparse_forward 函数里面添加，在函数一开始query_layer = query_layer.repeat_interleave(repeat_times, dim=-2)，在函数 return 之前添加topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)。不过0.5b 模型不建议使用稀疏模式，我们已经在 hugging face 的模型代码里面删去这一部分。非常感谢您的提醒。

suhmily10

Training instability with InfLLM v2 on MiniCPM4-8B using sparse attention

[Bad Case]: 使用minicpm4-0.5B同时启用sparse attention时报错

[Bad Case]: 使用minicpm4-0.5B同时启用sparse attention时报错

[Bad Case]: Vllm 部署MiniCPM3-8B 报错

[Bad Case]: 微调MiniCPM 4.1-8B时，其loss显著高于其他模型

[Feature Request]: InfLLM v2特性是否与主流推理框架（如vLLM或SGLang）适配？