CTranslate2
CTranslate2 copied to clipboard
Support for Zephyr and other "StableLmForCausalLM" models?
Any plans to support conversion of ```StableLmForCausalLM" models? I've noticed that they're very good; for example the new Zephyr model here:
https://huggingface.co/stabilityai/stablelm-zephyr-3b
Amazing performance for a 3B model, much better than Phi2 IMHO. Support was added into Transformers in version 4.38.2:
https://github.com/huggingface/transformers/releases/tag/v4.38.0
Here's the link to a description of the model architecture to help:
https://huggingface.co/docs/transformers/v4.38.2/en/model_doc/stablelm
Here is yet another badass model @minhthuc2502 . Would love to help create a converter but am not an expert. It's the 1.6b version of Zephyr:
https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
It kicks ass for its size. The only other small models with a context size of over 4,000 is gemma, which, at least in my testing, royally sucks. (referring to Gemma 2b, newest version 1.1 included).
Currently, the only reasonable option to build a chat application with ctranslate2 that uses a model smaller than 7b requires using gemma. I use the term "reasonable" because the phi converter is currently broke due to changes in phi2, and, at any rate, phi2 only has a context of 2048.
Zephyr 3b and Zephyr 1.6b are the best in their class, way better than gemma 2b. Other viable options are creating a converter for Qwen, which has a .5B model, actually.
Here are tests for gemma and others on a basic RAG question. gemma 2b only got half the question right no matter how many beams I used. HOWEVER, even the zephyr 1.6B model gave a 100% correct answer at beam size of 1.
In short, gemma 2b is fast, but sucks, while zephyr is only slightly less fast, but IS ABSOLUTELY AWESOME.
NOTE: The models in the legend with "ct2" in their name are obviously ctranslate2 models. The other models were tested using transformers along with bitsandbytes (using 4-bit), just FYI.
Lastly, llama.cpp already supports zephyr, qwen and others, but I'd rather not switch due to the additional dependency...Let me know @minhthuc2502 if you'll reconsider making this a higher priority. I know you're busy...thanks dude.
To maybe save you a few minutes..I've gathered the following information for someone/anyone:
-
The
config.jsonstates that the architecture is "StableLmForCausalLM" -
I think this is it https://huggingface.co/docs/transformers/v4.40.0/en/model_doc/stablelm
-
Additional info: https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo
Based on this snippet, hopefully it wouldn't be too complicated to create a converter for it...