Chinese-LLaMA-Alpaca 4核32G运行13B量化模型响应速度20s+

4核32G运行13B量化模型响应速度20s+

Open alanbeen opened this issue 1 year ago • 7 comments

启动命令是这个 python3 server.py --model 13B --cpu --listen --api --chat 需要什么样的配置才能提高运行速度

May 17 '23 06:05 alanbeen

可以去掉--cpu使用GPU推理，具体操作文档还请参考webui docs

May 17 '23 06:05 iMountTai

大显存的高端GPU才能跑起来。。7b能在合理时间内回复就谢天谢地了。。还13b，

May 17 '23 06:05 yaleimeng

大显存的高端GPU才能跑起来。。7b能在合理时间内回复就谢天谢地了。。还13b，

你知道A10 24G显存响应速度会有多快，打算买个好些的服务器

May 17 '23 06:05 alanbeen

可以去掉--cpu使用GPU推理，具体操作文档还请参考webui docs

我的是没有GPU的centos阿里云服务，你有尝试过GPU要多少跑的会快些的吗

May 17 '23 06:05 alanbeen

我使用1张24GB的3090，在7b上面推理，大部分响应在1~3秒，个别会长一点。单人使用算是能接受的范围。

May 17 '23 06:05 yaleimeng

有没有在arm架构上跑的性价比比较高的方案

May 17 '23 06:05 1979435

我使用1张24GB的3090，在7b上面推理，大部分响应在1~3秒，个别会长一点。单人使用算是能接受的范围。

13B的话正常推理会慢一半吧

May 17 '23 07:05 alanbeen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

May 24 '23 22:05 github-actions[bot]

你好，24G显存可以实现微调吗？

May 27 '23 15:05 notsb

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

Jun 03 '23 22:06 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

Jun 06 '23 22:06 github-actions[bot]

Chinese-LLaMA-Alpaca Chinese-LLaMA-Alpaca copied to clipboard

4核32G运行13B量化模型响应速度20s+

Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard