Arthur
Arthur
will make it for next release I hope!
Failing test is unrelated 😉
Mistral is already covered! LongT5 if it is like T5 and has attention bias that might not be supported
Not sure anyone is working on that but bert is already so small that I doubt it will have a lot of impact on perf!
FYI going forward we should rather use https://github.com/huggingface/transformers/blob/416711c3ea88109cf25a9c5f85b4aeee2cb831b5/src/transformers/models/llama/modeling_llama.py#L1058 as it is more self contained, easier to debug and maintain than the many paths in the atnn_mask utils
cc @muellerzr or @pacman100
gently pinging @muellerzr as you self assigned this!
Sure, could you make sure the CIs are green?
you can probably ignore it with ` # doctest: +SKIP`
Also cc @Rocketknight1