maxtext
maxtext copied to clipboard
Support for T5
Do you have plans to support encoder-decoder models like T5? It will be great to have T5 with flash attention 😃
What specific model would you like supported? We would only take this on if we saw sufficient interest (but in practice we see heavy movement towards decoder-only models).
Decoder only models are great for generative use cases but T5 family is the work horse for many discriminative tasks. For example, the flan-t5-base model has 2M downloads on Huggingface in the last month. Support for flan-t5 will add a huge value to the community.
It'd be great to have T5 models here as well.
I'm going to try to turn MaxText into encoder-decoder anyway, so native support is of course also appreciated :)
https://github.com/p-doom/maxtext/tree/colab_temp
We finally came around to implement encoder-decoder models in our maxtext fork. The synthetic data pipeline seems to work. Will add support for the real data pipeline later today.
okay I was a bit too fast, still have to fix a few things.