DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

Support for FLAN-T5

Open jihan-yin opened this issue 3 years ago • 3 comments
trafficstars

I saw that T5 wasn't in the list of supported huggingface transformers models. Are there plans / ETA for when the T5 family would be added? FLAN-T5 is a very strong llm for zero/fewshot instruction prompting. I am currently building out a hacky implementation for hosting with deepspeed-inference, but having it natively supported in deepspeed-mii would be ideal.

jihan-yin avatar Nov 21 '22 22:11 jihan-yin

We do support the T5 family with DeepSpeed-Inference with a custom injection policy (see this DeepSpeed unit test). However, we have not yet brought this support into MII. It's on our radar to add this in the future. We are also open to outside contributions if you would like to submit a PR!

mrwyattii avatar Nov 21 '22 23:11 mrwyattii

Also keep an eye on this PR, it’s currently a work in progress for better T5 support: https://github.com/microsoft/DeepSpeed/pull/2451

jeffra avatar Nov 21 '22 23:11 jeffra

Assuming that PR does get merged, would it also support Long T5?

mhillebrand avatar Jan 18 '24 05:01 mhillebrand