Chinese-LLaMA-Alpaca icon indicating copy to clipboard operation
Chinese-LLaMA-Alpaca copied to clipboard

4核32G运行13B量化模型响应速度20s+

Open alanbeen opened this issue 1 year ago • 7 comments

启动命令是这个 python3 server.py --model 13B --cpu --listen --api --chat 需要什么样的配置才能提高运行速度

alanbeen avatar May 17 '23 06:05 alanbeen

可以去掉--cpu使用GPU推理,具体操作文档还请参考webui docs

iMountTai avatar May 17 '23 06:05 iMountTai

大显存的高端GPU才能跑起来。。7b能在合理时间内回复就谢天谢地了。。还13b,

yaleimeng avatar May 17 '23 06:05 yaleimeng

大显存的高端GPU才能跑起来。。7b能在合理时间内回复就谢天谢地了。。还13b,

你知道A10 24G显存响应速度会有多快,打算买个好些的服务器

alanbeen avatar May 17 '23 06:05 alanbeen

可以去掉--cpu使用GPU推理,具体操作文档还请参考webui docs

我的是没有GPU的centos阿里云服务,你有尝试过GPU要多少跑的会快些的吗

alanbeen avatar May 17 '23 06:05 alanbeen

我使用1张24GB的3090,在7b上面推理,大部分响应在1~3秒,个别会长一点。单人使用算是能接受的范围。

yaleimeng avatar May 17 '23 06:05 yaleimeng

有没有在arm架构上跑的性价比比较高的方案

1979435 avatar May 17 '23 06:05 1979435

我使用1张24GB的3090,在7b上面推理,大部分响应在1~3秒,个别会长一点。单人使用算是能接受的范围。

13B的话正常推理会慢一半吧

alanbeen avatar May 17 '23 07:05 alanbeen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] avatar May 24 '23 22:05 github-actions[bot]

你好,24G显存可以实现微调吗?

notsb avatar May 27 '23 15:05 notsb

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] avatar Jun 03 '23 22:06 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

github-actions[bot] avatar Jun 06 '23 22:06 github-actions[bot]