Zhangquan Chen
Zhangquan Chen
遇到了同样的问题,请问解决了吗
Some question, how can I infer the process?
> 机器A800,vLLM 0.5.0,prompt是开始,输出max tokens=2048,temperature设0.7 > > vLLM加载Qwen2-72B-Instruct-gptq-int4,使用vLLM的benchmark脚本来做并发测试,无论是1个并发限制还是10个并发限制,输出均会重复。 https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py > >  > >  > > 当然我也测试了无限制并发的情况下,也会生成重复 > >  同样遇到了,有解决吗
已收到您的信件!
ok, thanks so much
hi, sorry for bothering you. We still haven't seen the correct links, thank you!
get it! thank you so much!
已收到您的信件!
In 'trainer_utils.py' file, I replaced the loading of adapter weights with directly loading the merged model. Then, I modified it to use the path of the merged model. Then it...