transformers Add sliding window attention to sdpa in mistral

Add sliding window attention to sdpa in mistral

Open ehuaa opened this issue 1 year ago • 7 comments

Feature request

https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L1006-L1023

In the code listed above, the latest version of transformers cannot use sliding window feature in mistral model. I doubt that the reason is you mentioned above, https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L687-L688 And this issue in PyTorch makes you bugged with custom attn_mask like sliding window attention mask. https://github.com/pytorch/pytorch/issues/112577

While this issue has been fixed since torch 2.2.0, and it has been released two weeks ago, can you add this feature back to sdpa kernel in mistral?

Motivation

I cannot use sliding window with sdpa right now, cause my gpu card is V100, i cannot work with flashattention2.

Your contribution

I think we can pass sliding_window param to _prepare_4d_causal_attention_mask_for_sdpa function.

Feb 12 '24 17:02 ehuaa

cc @fxmarty

Feb 12 '24 17:02 amyeroberts

Hi, thank you for the suggestion, SDPA support for mistral was added by @ArthurZucker in https://github.com/huggingface/transformers/pull/28133, maybe he has more insight.

Feb 19 '24 09:02 fxmarty

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

Feb 20 '24 04:02 ArthurZucker

I think it comes down to just adding sliding_window to the call for _prepare_4d_causal_attention_mask_for_sdpa yes. Would you like to open a PR?

Sure，and i'll open a PR later in this week

Feb 21 '24 02:02 ehuaa

any plan for pr?

Mar 21 '24 08:03 cyr0930

#29407 should fix this issue

Mar 21 '24 08:03 ArthurZucker

@ArthurZucker Oh you are right. Thanks.

Mar 22 '24 00:03 cyr0930

Fixed in https://github.com/huggingface/transformers/pull/30127

Apr 17 '24 09:04 fxmarty

transformers transformers copied to clipboard

Add sliding window attention to sdpa in mistral

Feature request

Motivation

Your contribution

transformers
transformers copied to clipboard