llm-foundry
llm-foundry copied to clipboard
Why there are chat and instruct models for 13b parameters?
It is quite interesting that there are chat and instruct models on the same structure! Why can’t we train one model that can do both? Is that because 13B is relatively small so we want the model to be more specific? For such a design choice, is a bigger model (say, 33B) also subject to such a decision?
On the application end, since the 13B models are specified on a certain task, can I use multiple models to perform a bigger task? Or is it better to use a bigger model to handle it?
Thank you in advance!
The main reason we have separate models for Instruct and Chat are that the data requirements and different. In addition, commercially-licensed data are available for Instruct, but not for Chat. So we didn't want to mix them. We made the same choice for our 7B and 30B models.
To keep life simple, I would only devote one model to a single task. If the task is challenging, you may need a bigger model.