NaViT icon indicating copy to clipboard operation
NaViT copied to clipboard

Questions about the implementation of multi-head self-attention

Open kk3lamb opened this issue 10 months ago • 0 comments

Thank you for your excellent code, it has been a great benefit to me. However, in the paper it is each image alone that does the multi-head self-attention, whereas in your implementation I feel that it is still all the patches in the sequence that do the multi-head self-attention, it is just that there is a distinction between the different images in the attention pooling. Did I get something wrong? Looking forward to your reply, thanks!

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

kk3lamb avatar Apr 10 '24 15:04 kk3lamb