maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

Support for T5

Open kishorenc opened this issue 11 months ago • 6 comments

Do you have plans to support encoder-decoder models like T5? It will be great to have T5 with flash attention 😃

kishorenc avatar Mar 27 '24 12:03 kishorenc

What specific model would you like supported? We would only take this on if we saw sufficient interest (but in practice we see heavy movement towards decoder-only models).

rwitten avatar Mar 27 '24 16:03 rwitten

Decoder only models are great for generative use cases but T5 family is the work horse for many discriminative tasks. For example, the flan-t5-base model has 2M downloads on Huggingface in the last month. Support for flan-t5 will add a huge value to the community.

kishorenc avatar Mar 27 '24 17:03 kishorenc

It'd be great to have T5 models here as well.

versae avatar Apr 08 '24 17:04 versae

I'm going to try to turn MaxText into encoder-decoder anyway, so native support is of course also appreciated :)

emergenz avatar Apr 12 '24 08:04 emergenz

https://github.com/p-doom/maxtext/tree/colab_temp

We finally came around to implement encoder-decoder models in our maxtext fork. The synthetic data pipeline seems to work. Will add support for the real data pipeline later today.

emergenz avatar Jul 16 '24 16:07 emergenz

okay I was a bit too fast, still have to fix a few things.

emergenz avatar Jul 16 '24 16:07 emergenz