Add Cohere's Command-R
https://txt.cohere.com/command-r/ https://huggingface.co/CohereForAI/c4ai-command-r-v01
I don't think the architecture needs any changes to support this
I don't think the architecture needs any changes to support this
I thought the same about Gemma 😄.
This model requires custom modeling and tokenizer classes. So, it might not be that straightforward to implement.
Do you see any specific differences in the modeling?
I posted it without even looking at the code. I mean, why would anyone provide a custom code if it's identical to the one that is already in transformers?
So, after a really quick "scanning" of the modeling code I found a couple of interesting details.
-
They have
logits_scalethat is applied on thelm_headoutput: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318/modeling_cohere.py#L1164 -
Forward method in
CohereDecoderLayerdoes things a bit differently: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318/modeling_cohere.py#L689-L709 It's definitely aparallel_residual+shared_attention_norm, but nothing of this is mentioned in the config file. Since it's a just a matter of a proper config setting, shouldn't give us any problems.
Maybe something more.
I think you've missed the rotate_half part, while the tokenizer is the same as llama
def rotate_half(x):
# Split and rotate
x1 = x[..., ::2]
x2 = x[..., 1::2]
rot_x = torch.stack([-x2, x1], dim=-1).flatten(-2)
return rot_x
https://github.com/ggerganov/llama.cpp/pull/6033#issuecomment-1993657166