Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Support for Cohere models

Open nyxkrage opened this issue 1 year ago • 1 comments
trafficstars

🚀 The feature, motivation and pitch

I would love to see support for the Cohere models. (https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 & https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
As far as I can tell the FusedLinearCrossEntropy kernel should just need to support scaling the logits by the logit_scale from the config, though I'm unsure whether the rest of the rest of the kernels would or would not work as is.

Thanks for the work

Alternatives

No response

Additional context

No response

nyxkrage avatar Sep 15 '24 11:09 nyxkrage

Ok, after some experimentation, and editing of the tests, the SwiGLU and LayerNorm kernels pass the correctness tests when compared with the reference ones from the cohere modelling code, however it seems that with Cohere something is different in regards to rope, the tests dont pass, but from the error it seems like its the same values, ~~I assume its something with how Cohere calculates the RoPE in float32 and downcasts after.~~ Seeing the comment on the rotate_half function in the cohere modeling code that was just added, it's seems obvious. Cohere slices by odds and evens rather than splitting in half.

nyxkrage avatar Sep 15 '24 14:09 nyxkrage