junyou2001

Results 1 comments of junyou2001

> sparse attention 要求 group size 16, 0.5b 用的话需要复制 q,参考代码 query_layer_repeat = query_layer.repeat_interleave(repeat_times, dim=-2) topk_attn_output = topk_attn_output.view(topk_attn_output.shape[0],topk_attn_output.shape[1]//repeat_times,repeat_times,-1).mean(dim=-2) 请问是在哪里添加呢