hardfish82 comments

Repositories
Issues
Comments

Results 3 comments of


                                            hardfish82

新的moe模型使用vllm启动报错AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

+1，完整报错信息如下： WARNING 06-07 15:49:56 config.py:208] gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models. 2024-06-07 15:49:58,873 INFO worker.py:1724 -- Started a local Ray instance....

新的moe模型使用vllm启动报错AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

transformers==4.41.0 vllm==0.4.0.post1 torch==2.1.2 测试了可以加载Qwen2-72B-Instruct-AWQ，但Qwen2-57B-A14B-Instruct-GPTQ-Int4仍然不能成功。

[FEATURE] Streaming response

Agree too.