support scaled_dot_product_attention for swin

Open yokosyun opened this issue 2 years ago • 1 comments

support torch.nn.functional.scaled_dot_product_attention for shifted_window_attention in swin https://github.com/pytorch/vision/pull/8183

torch.nn.functional.scaled_dot_product_attention is much more efficient. if we use large size window size, we can get the benefit of runtime

currently torch.nn.functional.scaled_dot_product_attention doesn't support Tensor type for attn_mask.

https://github.com/pytorch/pytorch/issues/116237

No response

No response

Dec 21 '23 10:12 yokosyun

Thanks for the request and for the PR @yokosyun . We'll try to keep an eye on https://github.com/pytorch/pytorch/issues/116237

Jan 02 '24 13:01 NicolasHug