transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add sliding window attention to sdpa in mistral

Open ehuaa opened this issue 1 year ago • 7 comments

Feature request

https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L1006-L1023 image

In the code listed above, the latest version of transformers cannot use sliding window feature in mistral model. I doubt that the reason is you mentioned above, https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L687-L688 image And this issue in PyTorch makes you bugged with custom attn_mask like sliding window attention mask. https://github.com/pytorch/pytorch/issues/112577

While this issue has been fixed since torch 2.2.0, and it has been released two weeks ago, can you add this feature back to sdpa kernel in mistral?

Motivation

I cannot use sliding window with sdpa right now, cause my gpu card is V100, i cannot work with flashattention2.

Your contribution

I think we can pass sliding_window param to _prepare_4d_causal_attention_mask_for_sdpa function.

ehuaa avatar Feb 12 '24 17:02 ehuaa

cc @fxmarty

amyeroberts avatar Feb 12 '24 17:02 amyeroberts

Hi, thank you for the suggestion, SDPA support for mistral was added by @ArthurZucker in https://github.com/huggingface/transformers/pull/28133, maybe he has more insight.

fxmarty avatar Feb 19 '24 09:02 fxmarty

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

ArthurZucker avatar Feb 20 '24 04:02 ArthurZucker

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

Sure,and i'll open a PR later in this week

ehuaa avatar Feb 21 '24 02:02 ehuaa

any plan for pr?

cyr0930 avatar Mar 21 '24 08:03 cyr0930

#29407 should fix this issue

ArthurZucker avatar Mar 21 '24 08:03 ArthurZucker

@ArthurZucker Oh you are right. Thanks.

cyr0930 avatar Mar 22 '24 00:03 cyr0930

Fixed in https://github.com/huggingface/transformers/pull/30127

fxmarty avatar Apr 17 '24 09:04 fxmarty