gpt-fast
gpt-fast copied to clipboard
Does it support the reasoning acceleration of Qwen-14B?
Qwen-14B: https://github.com/QwenLM/Qwen
It's similar to the llama architecture, so it should be easy to modify model.py
to support it.
I have tested it with Qwen-1.8B on RTX 2080, and the reasoning acceleration is about twice the time compared to the original (50 tok/s vs ~100 tok/s) which is fascinating. Considering the Owen series has the same architecture, I thought it should be working for Owen-14B.
ascinating. Considering t
3qs(Thank u),I will give it a try.
@dashi6174 https://github.com/DongqiShen/qwen-fast