suhmily10

Results 6 comments of suhmily10

May I ask which training framework are you using? Did you train Minicpm4 with any parallel strategies such as tensor parallelism or sequence parallelism? Also we open sourced our training...

sparse attention 要求 group size 16, 0.5b 用的话需要复制 q,参考代码 query_layer_repeat = query_layer.repeat_interleave(repeat_times, dim=-2) topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)

可以在MiniCPMInfLLMv2Attention 的 sparse_forward 函数里面添加,在函数一开始query_layer = query_layer.repeat_interleave(repeat_times, dim=-2),在函数 return 之前添加topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)。不过0.5b 模型不建议使用稀疏模式,我们已经在 hugging face 的模型代码里面删去这一部分。非常感谢您的提醒。

vllm 版本多少呢,试试升级到 0.9.2 以后的版本呢?