volo Some thoughts about volo

Some thoughts about volo

Open cfzd opened this issue 3 years ago • 0 comments

Thanks for your great work! After reading the paper, I have a question: Can I think volo as a "pixel-wise conditional conv" network?

The reasons are:

The weighted average and fold operations together in Fig. 2 are actually a conv operation, except the "conv kernel" is generated from the outlook attention.
The outlook attention, i.e. C -> k**4 operation, can be viewed as generating "conv kernel" for all HxW pixels

Combining these two points, I think volo is really like a "pixel-wise conditional conv" network.

Jul 19 '21 12:07 cfzd