TransFG
TransFG copied to clipboard
about Part Selection Module
Thanks for your great work!
I have a question about selecting tokens with maximum activation in Part Selection Module.
In Eq.6, is a_l^i the attention-score calculated separately for the class token and other N tokens? So the dimension of a_l^i is N right?