NaViT Questions about the implementation of multi-head self-attention

Questions about the implementation of multi-head self-attention

Open kk3lamb opened this issue 10 months ago • 0 comments

Thank you for your excellent code, it has been a great benefit to me. However, in the paper it is each image alone that does the multi-head self-attention, whereas in your implementation I feel that it is still all the patches in the sequence that do the multi-head self-attention, it is just that there is a distinction between the different images in the attention pooling. Did I get something wrong? Looking forward to your reply, thanks!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Apr 10 '24 15:04 kk3lamb

NaViT NaViT copied to clipboard

Questions about the implementation of multi-head self-attention

Upvote & Fund

NaViT
NaViT copied to clipboard