Could this tool apply for encoder-decoder model, like Flan-T5?
It's great tool to show how to build a chatgpt-like model based on some foundation models. I wonder could it support the encoder-decoder model, like Flan-T5? Could we just directly load those models and run these scripts on them? Thanks!
Hard to say. We changed the model to our pretrained GPT, it kept crash during training. I guess there are a lot of things to tune in ds_utils.py to make it actually work for different models.
what's the difference between your model and opt-x? Architecture?
Our Model is a GPT model.
There is a example of Flan-T5 in the repo: training/deepseed-flan-t5-summarization.ipynb