Refactor flash attention implementation in transformers

Open ArthurZucker opened this issue 8 months ago • 5 comments

What does this PR do?

EDIT: just refactor for now

Enables us to run transformers model with Ragged Tensors:

One of the goals is also to make it easy for people to re-define the ExtraKwargs typedict, to build on top of transformers

Jun 17 '24 08:06 ArthurZucker