[Bug]: 为什么Flash Attention2里不需要repeat_kv

Open huyiwen opened this issue 1 year ago • 0 comments

Is there an existing issue ? / 是否已有相关的 issue ?

[X] I have searched, and there is no existing issue. / 我已经搜索过了，没有相关的 issue。

Describe the bug / 描述这个 bug

在比较modeling_minicpm代码和其他模型的代码时，我发现MiniCPMFlashAttention2中的forward没有repeat_kv，而MiniCPMAttention、MiniCPMSdpaAttention、Qwen2FlashAttention2等都有repeat_kv。这一设计似乎和Llama中的实现方式是对应的，请问这么设计是有什么考虑吗

To Reproduce / 如何复现

https://github.com/OpenBMB/MiniCPM/blob/main/model/modeling_minicpm.py

Expected behavior / 期望的结果

No response

Screenshots / 截图

No response

Environment / 环境

- OS: [e.g. Ubuntu 20.04]
- Pytorch: [e.g. torch 2.0.0]
- CUDA: [e.g. CUDA 11.8]
- Device: [e.g. A10, RTX3090]

Additional context / 其他信息

No response

Aug 04 '24 08:08 huyiwen