inseq icon indicating copy to clipboard operation
inseq copied to clipboard

[Summary] Add internals-based feature attribution methods

Open gsarti opened this issue 2 years ago • 15 comments

🚀 Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name Source Code implementation  Status
Last-Layer Attention Jain and Wallace '19 successar/AttentionExplanation
Aggregated Attention Jain and Wallace '19 successar/AttentionExplanation
Attention Flow Abnar and Zuidema '20 samiraabnar/attention_flow
Attention Rollout Abnar and Zuidema '20 samiraabnar/attention_flow
Attention with Values Norm (Attn-N) Kobayashi et al '20 gorokoba560/norm-analysis-of-transformer
Attention with Residual Norm (AttnRes-N) Kobayashi et al '20 gorokoba560/norm-analysis-of-transformer
Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N) Kobayashi et al '21 gorokoba560/norm-analysis-of-transformer
Attention-driven Relevance Propagation Chefer et al. '21 hila-chefer/Transformer-MM-Explainability
ALTI+ Ferrando et al '22 mt-upc/transformer-contributions-nmt
GlobEnc Modarressi et al. '22 mohsenfayyaz/globenc
Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N) Kobayashi et al '23 -
Attention x Transformer Block Norm Kobayashi et al '23 -
Logit Ferrando et al '23 mt-upc/logit-explanations
ALTI-Logit Ferrando et al '23 mt-upc/logit-explanations
DecompX Modarressi et al '23 mohsenfayyaz/DecompX

Notes:

  1. Add the possibility to scale attention weights by the norm of value vectors, shown to be effective for alignment and encoder models (Ferrando and Costa-jussà '21, Treviso et al. '21)
  2. The ALTI+ technique is an extension of the ALTI method by Ferrando et al. '22 (paper, code) to Encoder-Decoder architectures. It was recently used by the Facebook team to detect hallucinated toxicity by highlighting toxic keywords paying attention to the source (NLLB paper, Figure 31).
  3. Attention Flow is very computationally expensive to compute but has proven SHAP guarantees for same-layer attribution, which is not the case for Rollout or other methods. Flow and rollout should be propagation methods rather than stand-alone approaches since they are used for most attention-based attributions.
  4. GlobEnc corresponds roughly to Attention x Transformer Block Norm but ignores the FFN part, that in the latter is incorporated by a localized application of Integrated Gradients with 0-valued baselines (authors' default)

gsarti avatar Nov 30 '21 19:11 gsarti