NaViT
NaViT copied to clipboard
Questions about the implementation of multi-head self-attention
Thank you for your excellent code, it has been a great benefit to me. However, in the paper it is each image alone that does the multi-head self-attention, whereas in your implementation I feel that it is still all the patches in the sequence that do the multi-head self-attention, it is just that there is a distinction between the different images in the attention pooling. Did I get something wrong? Looking forward to your reply, thanks!
Upvote & Fund
- We're using Polar.sh so you can upvote and help fund this issue.
- We receive the funding once the issue is completed & confirmed by you.
- Thank you in advance for helping prioritize & fund our backlog.