inseq [Summary] Add internals-based feature attribution methods

[Summary] Add internals-based feature attribution methods

Open gsarti opened this issue 2 years ago • 15 comments

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name	Source	Code implementation	Status
Last-Layer Attention	Jain and Wallace '19	`successar/AttentionExplanation`	✅
Aggregated Attention	Jain and Wallace '19	`successar/AttentionExplanation`	✅
Attention Flow	Abnar and Zuidema '20	`samiraabnar/attention_flow`
Attention Rollout	Abnar and Zuidema '20	`samiraabnar/attention_flow`
Attention with Values Norm (Attn-N)	Kobayashi et al '20	`gorokoba560/norm-analysis-of-transformer`
Attention with Residual Norm (AttnRes-N)	Kobayashi et al '20	`gorokoba560/norm-analysis-of-transformer`
Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N)	Kobayashi et al '21	`gorokoba560/norm-analysis-of-transformer`
Attention-driven Relevance Propagation	Chefer et al. '21	`hila-chefer/Transformer-MM-Explainability`
ALTI+	Ferrando et al '22	`mt-upc/transformer-contributions-nmt`
GlobEnc	Modarressi et al. '22	`mohsenfayyaz/globenc`
Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N)	Kobayashi et al '23	-
Attention x Transformer Block Norm	Kobayashi et al '23	-
Logit	Ferrando et al '23	`mt-upc/logit-explanations`
ALTI-Logit	Ferrando et al '23	`mt-upc/logit-explanations`
DecompX	Modarressi et al '23	`mohsenfayyaz/DecompX`

Notes:

Add the possibility to scale attention weights by the norm of value vectors, shown to be effective for alignment and encoder models (Ferrando and Costa-jussà '21, Treviso et al. '21)
The ALTI+ technique is an extension of the ALTI method by Ferrando et al. '22 (paper, code) to Encoder-Decoder architectures. It was recently used by the Facebook team to detect hallucinated toxicity by highlighting toxic keywords paying attention to the source (NLLB paper, Figure 31).
Attention Flow is very computationally expensive to compute but has proven SHAP guarantees for same-layer attribution, which is not the case for Rollout or other methods. Flow and rollout should be propagation methods rather than stand-alone approaches since they are used for most attention-based attributions.
GlobEnc corresponds roughly to Attention x Transformer Block Norm but ignores the FFN part, that in the latter is incorporated by a localized application of Integrated Gradients with 0-valued baselines (authors' default)

Nov 30 '21 19:11 gsarti