toodle
toodle
Thank you for your reply. @Andrew-Qibin > The attention weights are generated by a linear but an unfold operation is operated on the value tensor. My concern is the attention...
Agree with @monney Only central pixel is considered in the attention weight generation process. How can they represent the importance of neighboring context?
@monney Thanks. Agree with your opinon on Dynamic Conv as discussed in https://github.com/sail-sg/volo/issues/5
@monney I understand that surrounding pixel can be aggregated bu `Unfold` operation. However, my focus is the **attention weights** becuase attention weights do not get surrouding information. I admit it...
Thanks for your reply but I don't think the difference is clear. > 1. The outlooker in VOLO is a new attention mechanism that targets at encoding fine-level token representations....