gpt-fast Does it support the reasoning acceleration of Qwen-14B?

Does it support the reasoning acceleration of Qwen-14B?

Open dashi6174 opened this issue 1 year ago • 4 comments

Qwen-14B： https://github.com/QwenLM/Qwen

Dec 04 '23 08:12 dashi6174

It's similar to the llama architecture, so it should be easy to modify model.py to support it.

Dec 04 '23 21:12 Chillee

I have tested it with Qwen-1.8B on RTX 2080, and the reasoning acceleration is about twice the time compared to the original (50 tok/s vs ~100 tok/s) which is fascinating. Considering the Owen series has the same architecture, I thought it should be working for Owen-14B.

Dec 07 '23 03:12 DongqiShen

ascinating. Considering t

3qs（Thank u），I will give it a try.

Dec 08 '23 06:12 dashi6174

@dashi6174 https://github.com/DongqiShen/qwen-fast

Dec 08 '23 15:12 DongqiShen

gpt-fast gpt-fast copied to clipboard

Does it support the reasoning acceleration of Qwen-14B?

gpt-fast
gpt-fast copied to clipboard