Could this tool apply for encoder-decoder model, like Flan-T5?

Open henryxiao1997 opened this issue 2 years ago • 3 comments

It's great tool to show how to build a chatgpt-like model based on some foundation models. I wonder could it support the encoder-decoder model, like Flan-T5? Could we just directly load those models and run these scripts on them? Thanks!

Apr 18 '23 03:04 henryxiao1997

Hard to say. We changed the model to our pretrained GPT, it kept crash during training. I guess there are a lot of things to tune in ds_utils.py to make it actually work for different models.

Apr 18 '23 08:04 ruihan0495

what's the difference between your model and opt-x? Architecture?

Apr 23 '23 13:04 henryxiao1997

Our Model is a GPT model.

Apr 24 '23 10:04 ruihan0495

There is a example of Flan-T5 in the repo: training/deepseed-flan-t5-summarization.ipynb

Jul 12 '23 15:07 BrunoGomesCoelho