LongLM icon indicating copy to clipboard operation
LongLM copied to clipboard

Cohere command r

Open flaviusburca opened this issue 1 year ago • 1 comments

Is it possible to adapt this to cohere command-r models ?

flaviusburca avatar Jun 04 '24 08:06 flaviusburca

Hi! If the model mentioned is CohereForAI/c4ai-command-r-v01, we believe it's possible. It uses typical RoPE. We quickly checked its implementation in Hugging Face's Transformers library. It looks pretty similar to Llama. You can refer to our Llama implementation to modify Cohere's code.

One thing that could matter is that CohereForAI/c4ai-command-r-v01 uses a very large RoPE theta—8,000,000.0, which is much larger than that of other models. This may cause the empirical rule for selecting good hyperparameters (group size, neighbor window) to fail. You may need to try several combinations to find a better one.

Mooler0410 avatar Jun 05 '24 18:06 Mooler0410