inseq icon indicating copy to clipboard operation
inseq copied to clipboard

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU

Open MoritzLaurer opened this issue 2 years ago • 2 comments

🐛 Bug Report

Inference on a google colab GPU is very slow. There is no significant difference if the model runs on cuda or CPU

🔬 How To Reproduce

The following model.attribute(...) code runs for around 33 to 47 seconds both on a colab CPU or GPU. I tried passing the device to the model and the model.device confirms that it's running on cuda, but it still takes very long to run only 2 sentences. (I don't know the underlying computations for attribution enough to know if this is to be expected, or if this should be faster. If it's always that slow, then it seems practically infeasible to analyse larger corpora)

import inseq
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

print(inseq.list_feature_attribution_methods())
model = inseq.load_model("google/flan-t5-small", attribution_method="discretized_integrated_gradients", device=device)

model.to(device)

out = model.attribute(
    input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)

model.device

Environment

  • OS: linux, google colab
  • Python version: Python 3.8.10
  • Inseq version: 0.3.3

Expected behavior

Faster inference with a GPU/cuda

(Thanks btw, for the fix for returning the per-token scores in a dictionary, the new method works well :) )

MoritzLaurer avatar Jan 20 '23 14:01 MoritzLaurer

Hi @MoritzLaurer , thanks for your comment!

The slowness you report is most likely specific to the discretized_integrated_gradient method, since the current implementation builds non-linear interpolation paths in a sequential manner. We currently have issue #113 tracking a bug with batching with this method, and we are in touch with the authors.

In the meantime, I suggest using the more common saliency or integrated_gradients approach that should be considerably faster on GPU. Bastings et al. 2022 shows how Gradient L2 (the default outcome using saliency in Inseq since v0.3.3) works well in terms of faithfulness on Transformer-based classifiers, so that could be a good starting point! Alternatively, attention attribution only requires forward passes, but it's less principled.

Hope it helps!

gsarti avatar Jan 20 '23 15:01 gsarti

ok, thanks, will try the other methods. (good to know that there might be a fix at some point, in my ad-hoc tests the discretized_integrated_gradient method seems to make the most interpretable attributions)

MoritzLaurer avatar Jan 22 '23 14:01 MoritzLaurer