lxb0425 issues

Results 18 issues of


                                            lxb0425

国产GPU得支持

请问下支持哪些国产GPU本地化部署啊

关于微调后qwen2-72B-instruct-int4-gptq后的压测和长文本测试

你好我正在使用8张4090做qwen2-72B-instruct-int4-gptq的并发压测和长文本我使用的是vllm部署命令如下 chat-10 是微调后再量化填充后的的版本 python -m vllm.entrypoints.openai.api_server --model /workspace/chat-2.0 --host 0.0.0.0 --port 7864 --tensor-parallel-size 8 --max-model-len 30000 --served-model-name chat-v2.0 --gpu-memory-utilization 0.9 conf.json的yarn配置和不配置都试过了 1 文本输入8000个字单独1个线程没问题响应得36s左右，几个线程就垮了是方式不对吗还是其他原因或者有什么工具可以让我测试吗...

api的问题

有没有api接入到其他系统 webui.py app.py 启动的是界面有没有接口传入图片文本生成视频回来的那种

[Bug]: 启动之后用了一段时间显存越占越多

### Your current environment 2*A100 配置启动项 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 7864 --max-model-len 8000 --served-model-name chat-v2.0 --model /workspace/sdata/checkpoint-140-merged --enforce-eager --tensor-parallel-size 2 --gpu-memory-utilization 0.95 ### Model Input Dumps ![image](https://github.com/user-attachments/assets/d34eac81-82ed-4760-9fa7-5967d4b8d00e)...

bug

[Question]: 2*A100部署qwen2.5-72B-instruct的问题

### Has this been raised before? - [X] I have checked [the GitHub README](https://github.com/QwenLM/Qwen2.5). - [X] I have checked [the Qwen documentation](https://qwen.readthedocs.io) and cannot find an answer there. - [X]...

麦克风说话的音频的保存问题

![image](https://github.com/user-attachments/assets/89324356-4e5c-426a-bd79-53a0f0a222d8) 使用的0.1.10 启动的是bash run_server_2pass.sh ![image](https://github.com/user-attachments/assets/1e10957b-da16-49bc-af34-7cf4799ad5d9) ![image](https://github.com/user-attachments/assets/8ef2069e-8929-476a-81a9-f5749535c6b0) docker里修改了websocket-server-2pass 文件杀掉funasr-wss-server-2pass 重新启动为啥没有生效啊也尝试重启容器依然没生效重新编译但是build下面的bin 只生成了几个文件都没有生成funasr-wss-server-2pass 文件中间error了 ![image](https://github.com/user-attachments/assets/b6fc381a-6414-4c59-80dc-f39ed106dd90)

question

[Performance]: VLLM 请求数量过多时太慢

### Your current environment ```text The output of `python collect_env.py` ``` ### How would you like to use vllm 我正在使用一张A100 部署的72B量化模型这是启动脚本 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --max-model-len 9000 --served-model-name...

performance

跑着跑着报错了

**例行检查** [//]: # '方框内填 x 表示打钩' - [ ] 我已确认目前没有类似 issue - [ ] 我已完整查看过项目 README，以及[项目文档](https://doc.tryfastgpt.ai/docs/intro/) - [ ] 我使用了自己的 key，并确认我的 key 是可正常使用的 - [ ] 我理解并愿意跟进此 issue，协助测试和提供反馈 - [x]...

bug