Question on the number of heads used in the analysis

Open sehunfromdaegu opened this issue 1 year ago • 1 comments

Thank you for sharing the code. I'm confusing something, I would appreciate if my understanding is correct.

Are you using the all heads output for the analysis? The paper you mentioned 'Roles and Utilization of Attention Heads in Transformer-based Neural Language Models' appears to use only selected heads. But your code seems to use all heads. Is this correct?
After extracting all the features, are they concatenated and used as an input for a single linear binary classifier? If they are concatenated, then the dimension of it would be quite large I guess.

Jul 18 '24 11:07 sehunfromdaegu

Hello. Sorry very much for a late response.

Yes, it is correct.
Yes, they are concatenated. But it's okay because we use regularization in our logistic regression, so it works even if the amount of features is larger than the amount of examples in the train set.

Sep 02 '24 13:09 SilverSolver