MiniCPM [Bad Case]: 使用minicpm4-0.5B同时启用sparse attention时报错

Description / 描述

你好！我注意到minicpm4-0.5B中描述无法支持sparse attention，但是我看modeling中有就修改为可用，但报错：

topk_idx[topk_idx >= q_idx[None, :, None]] = -1
RuntimeError: The size of tensor a (355) must match the size of tensor b (711) at non-singleton dimension 1

因为还没有细看infllm实现所以不太好debug，请问该如何解决呢？非常感谢！

Case Explaination / 案例解释

No response

Oct 07 '25 13:10 junyou2001

sparse attention 要求 group size 16, 0.5b 用的话需要复制 q，参考代码 query_layer_repeat = query_layer.repeat_interleave(repeat_times, dim=-2) topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)

Oct 15 '25 13:10 suhmily10

sparse attention 要求 group size 16, 0.5b 用的话需要复制 q，参考代码 query_layer_repeat = query_layer.repeat_interleave(repeat_times, dim=-2) topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)

请问是在哪里添加呢

Oct 17 '25 12:10 junyou2001

可以在MiniCPMInfLLMv2Attention 的 sparse_forward 函数里面添加，在函数一开始query_layer = query_layer.repeat_interleave(repeat_times, dim=-2)，在函数 return 之前添加topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2)。不过0.5b 模型不建议使用稀疏模式，我们已经在 hugging face 的模型代码里面删去这一部分。非常感谢您的提醒。

Oct 20 '25 02:10 suhmily10