Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

[Proposal] Add Support for Yi-6B and Yi-34B

Open neelnanda-io opened this issue 1 year ago • 2 comments

Proposal

Yi-6B and Yi-34B are new models that make a plausible claim to be the current best open source models, beating Falcon 180B and LLaMA-2 70B on MMLU. It'd be great to support them! I'm particularly keen on the 6B one, it seems like there are cool projects that are easier on the best 6B models around, though it may not add much beyond Mistral 7B

I have not yet read the code so I do not know what architectural quirks it has.

You could use the LLaMA PRs and Mistral PRs as models for what this should look like

https://huggingface.co/01-ai/Yi-6B https://huggingface.co/01-ai/Yi-34B

neelnanda-io avatar Nov 14 '23 22:11 neelnanda-io

I hear claims that it's basically just the LLaMA architecture! This would make this super easy woot. https://huggingface.co/01-ai/Yi-34B/discussions/11

neelnanda-io avatar Nov 14 '23 23:11 neelnanda-io

Just noting here that Yi models (both 6B and 34B) use grouped-query attention (num_key_value_heads < num_attention_heads). Grouped-query attention is implemented in #443, so this integration should be straightforward once that PR is in.

andyrdt avatar Jan 10 '24 23:01 andyrdt