Gabriele Sarti

Results 46 comments of Gabriele Sarti

Hi @saxenarohit, in principle the Captum LRP implementation should be directly compatible with Inseq. However, the implementation is very model specific with some notable (and to my knowledge, presently unsolved)...

Hey @LuukSuurmeijer, thanks a lot for this PR! I had a look and added some very minor fixes (add a Literal type for the allowed precision strings, added a docstring...

Thanks for the quick answer! Sounds good for `attention_mask`. Regarding the return dictionary, in principle, having the sequences would already mean enabling most gradient/occlusion-based methods. Attention attribution is actively being...

Here is a functioning version @mrektor @michelecafagna26 ```python from captum.attr import InputXGradient from transformers import pipeline pipe = pipeline('text2text-generation', model='google/flan-t5-base', tokenizer='google/flan-t5-base', device='cuda') input_ids = pipe.tokenizer(["A simple example"], return_tensors="pt", padding=True, truncation=True).input_ids.to('cuda')...

Hi @kayoyin @CoderPat, Could you please let me know if you intend to fix the issue with the highlights? Thank you in advance!

@lsickert The attention-based feature attribution methods you mention involve the simplest case of taking the average attention weight for every token across all model layers, or the attention weight for...

Good point, I'd say returning the attention scores by default shouldn't be a problem, and it's probably the easiest way to ensure compatibility with other methods without dramatic changes to...

Wherever possible we want to make methods customizable, but with sensible defaults for those not interested in fiddling with them. For attention, the ideal setting would probably be to use...

The latter would probably make more sense, deeming relevant what was relevant at least for one head. I don't have an intuition for what to expect from the results though,...

This is an important point. In the current gradient attribution approaches, if the user does not provide a target output we first generate the output sentence using whatever strategy is...