Ludwig Sickert

Results 16 comments of Ludwig Sickert

Just want to add my two cents to this. After struggling to find the reason for this error within my own code, I finally noticed that one of my other...

@gsarti I was reading through the Jain and Wallace paper earlier and looking at their code, but I am not entirely sure what you mean with Last-Layer and Aggregated attention...

@gsarti Thanks a lot, that explains it. I was not sure if I was missing anything from the Jain and Wallace paper since they were introducing their methods for Adversarial...

@gsarti I think this concerns all attention methods, so I wanted to get your opinion on this before further implementing it: To run the attention-based methods, we need the `output_attentions=True`...

Ok then I will implement it like that for now

@gsarti How would you want to deal with the information of multiple attention heads? I have seen several methods being used here in the different papers of either using the...

Ok, thanks for the explanation. One follow-up question: How would you specify "max" in this context? Taking the head with the overall maximal attention values or using the max values...

@gsarti Sorry for all the questions, but there is another issue that came up: Since we are using the `generate()` method, most models I have tested have a defined number...

Hmm, I am not sure if I follow entirely. The main issue is that transformers is giving me all attention scores for all steps. If I understand it correctly now,...

Further possible additions to Basic Attention methods: - [x] rename LastLayerAttention to single-layer attention and make the layer configurable (last layer by default) - [x] allow users to choose a...