vision
vision copied to clipboard
support scaled_dot_product_attention for swin
🚀 The feature
support torch.nn.functional.scaled_dot_product_attention for shifted_window_attention in swin https://github.com/pytorch/vision/pull/8183
Motivation, pitch
torch.nn.functional.scaled_dot_product_attention is much more efficient. if we use large size window size, we can get the benefit of runtime
Limitation
currently torch.nn.functional.scaled_dot_product_attention doesn't support Tensor type for attn_mask.
https://github.com/pytorch/pytorch/issues/116237
Alternatives
No response
Additional context
No response
Thanks for the request and for the PR @yokosyun . We'll try to keep an eye on https://github.com/pytorch/pytorch/issues/116237