inference QUESTION: How to use both gptq and vllm in qwen-14b model?

QUESTION: How to use both gptq and vllm in qwen-14b model?

Open tms2003 opened this issue 1 year ago • 5 comments

The vllm version already supports gptq quantization of qwen 14b-4bit (in the quantization branch), so how do I use them in xinference?

Nov 29 '23 10:11 tms2003

We need to test it, some adaptation work may be required.

Nov 30 '23 04:11 aresnow1

It seems vllm's support for gptq is in progress(https://github.com/vllm-project/vllm/pull/1580), how did you use vllm with GPTQ?

Nov 30 '23 06:11 aresnow1

@aresnow1 The branch code of vllm can be found at https://github.com/chu-tianxiang/vllm-gptq/. It is a branch from vllm. All questions can be asked on the main branch of vllm, so it is the correct official one. The code, but due to many conflicts in the code, was not merged into the main branch. After testing, its code is usable.

Nov 30 '23 07:11 tms2003

It needs some changes to support passing quantization method, I'll create a PR to support this later.

Nov 30 '23 10:11 aresnow1

This issue is stale because it has been open for 7 days with no activity.

Aug 08 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Aug 13 '24 19:08 github-actions[bot]

inference inference copied to clipboard

QUESTION: How to use both gptq and vllm in qwen-14b model?

inference
inference copied to clipboard