Edisonwei54
Edisonwei54
> 是的,这将在以后修复,您可以从[这里](https://github.com/eosphoros-ai/DB-GPT-Web/issues/74)找到 请问修复了吗?我使用最新版本,还是遇到了同样的问题
> API更新 [4afecd1](https://github.com/RVC-Boss/GPT-SoVITS/commit/4afecd1950845a974350cd2d8dc8dcf12398fba9) 我现在测试api的流式输出,感觉好像还是模型推理结束之后,把数据用流式的形式返回,而不是一边推理一边返回吧
> 切割使用,分段流返 但是在测试的时候,效果还是切割的片段全部推理结束之后才会流式分段返回,确实在播放的时候看到了时间上的自增
@WoosukKwon How can I solve this problem
> Current punica kernel can't process ` h_out=3424` , you can set `-tensor-parallel-size 2` to avoid this error Thanks, It can work now, but I still want to use all...
Traceback (most recent call last): File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/worker_manager.py", line 150, in _load_lora lora = self._lora_model_cls.from_local_checkpoint( File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/models.py", line 246, in from_local_checkpoint return cls.from_lora_tensors( File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/models.py", line 150, in from_lora_tensors module_name, is_lora_a...
@WoosukKwon @zhuohan123 https://github.com/vllm-project/vllm/pull/3177 I see that this submission already supports Lora for Qwen2. What is the reason for it still not working? Is it due to Lora's issue?
同样的问题,有什么解决的方法吗
同样的问题,有什么解决办法吗
@Jintao-Huang @tastelikefeet