captum
captum copied to clipboard
Different attributions for IntegratedGradients/LayerIntegratedGradients for BERT embeddings.
❓ Questions and Help
When doing inference on a trained BertForSequenceClassification
model (which has a BertModel as its base), I get slightly different results for
-
IntegratedGradients
and inputting embeddings -
LayerIntegratedGradients
initialized for themodel.bert.embeddings
layer and inputting input ids
In the following "ig" stands for IntegratedGradients and "lig" for LayerIntegratedgradients.
Forward functions:
def ig_func(inputs):
pred = model(inputs)
return pred.logits
def ig_embed_func(inputs):
pred = model(inputs_embeds=inputs)
return pred.logits
Attribution class:
lig = LayerIntegratedGradients(ig_func, model.bert.embeddings)
ig = IntegratedGradients(ig_embed_func)
Input representations:
input_ids = ...
ref_input_ids = token_reference.generate_reference(input_ids.size(-1), device=device).unsqueeze(0)
input_embeds = model.get_input_embeddings()(input_ids)
ref_input_embeds = model.get_input_embeddings()(ref_input_ids)
Calculation:
lig_attributions, delta = lig.attribute(
inputs=input_ids,
baselines=ref_input_ids,
additional_forward_args=None,
return_convergence_delta=True,
n_steps=50,
target=0
)
ig_attributions, delta = ig.attribute(
inputs=input_embeds,
baselines=ref_input_embeds,
additional_forward_args=None,
return_convergence_delta=True,
n_steps=50,
target=0
)
Summation:
ig_attributions_sum = ig_attributions.sum(dim=-1).squeeze(0)
lig_attributions_sum = lig_attributions.sum(dim=-1).squeeze(0)
Output
>>> ig_attributions_sum[:10], lig_attributions_sum[:10]
(tensor([ 0.0755, -0.0366, -0.0526, 0.1775, 0.0118, 0.0393, 0.0243, 0.0654,
0.0490, 0.0587], dtype=torch.float64, grad_fn=<SliceBackward0>),
tensor([ 0.0757, -0.0333, -0.0524, 0.0858, 0.0139, 0.0132, 0.0586, 0.0767,
0.0621, 0.0593], dtype=torch.float64))
The outputs are "similar" but not identical.
To the best of my knowledge, the differences occur with any input and any other of our trained models.
Background: We came across that issue when comparing the calculation of IG attributions in the ferret tool (which uses IntegratedGradients) to the ones Captum proposes (using LayerIntegratedGradients) and found the above mentioned differences.