transformers
transformers copied to clipboard
Refactor flash attention implementation in transformers
What does this PR do?
EDIT: just refactor for now
Enables us to run transformers model with Ragged Tensors:
One of the goals is also to make it easy for people to re-define the ExtraKwargs
typedict, to build on top of transformers