WeightNet
WeightNet copied to clipboard
Is this actually attention of different channels for different kernels?
Hi, thanks for the great work on unifying SENet and CondConv. I just want to confirm if my understanding is correct: In SENet, attention is computed across the C input channels, while in CondConv, attention is applied to the M kernels. So, in WeightNet, are we generalizing this concept by computing the attention of the C input channels for the M kernels while the kernels are initialized in nn.Conv2d instead of manually define the nn.Parameter? And the hyperparameter G is introduced to generalize the cases and also for cross channels interaction. Like in SENet, the channels does not interact while in CondConv, all the channels involved.