optimum
optimum copied to clipboard
longT5 BetterTransformer implementation
Feature request
longT5 BetterTransformer implementation
Motivation
Encoder-decoder model trained on large context allow machine translation tasks
Your contribution
I looked at the implementation of regular T5 and it doesnt look to complex, i tried to implement myself but didnt succeed. If i can contribute please let me know.
Thank you, Omri
seconding this! would be great
Totally on board with this! Would love to see this feature added!
@fxmarty can we try tackle this together?
Thanks in advance
Hi for reference we are upstreaming SDPA in Transformers, maybe it would be a better fit for longT5: https://github.com/huggingface/transformers/issues/28005
Leaving this open as we may leverage nested tensors for longt5 (which are not in Transformers).
Hi @All. Is this still open or you guys will work on it?