llm-foundry Why there are chat and instruct models for 13b parameters?

Why there are chat and instruct models for 13b parameters?

Open Sidhbcu opened this issue 2 years ago • 1 comments

It is quite interesting that there are chat and instruct models on the same structure! Why can’t we train one model that can do both? Is that because 13B is relatively small so we want the model to be more specific? For such a design choice, is a bigger model (say, 33B) also subject to such a decision?

On the application end, since the 13B models are specified on a certain task, can I use multiple models to perform a bigger task? Or is it better to use a bigger model to handle it?

Thank you in advance!

Jun 23 '23 18:06 Sidhbcu

The main reason we have separate models for Instruct and Chat are that the data requirements and different. In addition, commercially-licensed data are available for Instruct, but not for Chat. So we didn't want to mix them. We made the same choice for our 7B and 30B models.

To keep life simple, I would only devote one model to a single task. If the task is challenging, you may need a bigger model.

Jun 27 '23 00:06 alextrott16

llm-foundry llm-foundry copied to clipboard

Why there are chat and instruct models for 13b parameters?

llm-foundry
llm-foundry copied to clipboard