Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard
4核32G运行13B量化模型响应速度20s+
启动命令是这个 python3 server.py --model 13B --cpu --listen --api --chat 需要什么样的配置才能提高运行速度
可以去掉--cpu使用GPU推理,具体操作文档还请参考webui docs
大显存的高端GPU才能跑起来。。7b能在合理时间内回复就谢天谢地了。。还13b,
大显存的高端GPU才能跑起来。。7b能在合理时间内回复就谢天谢地了。。还13b,
你知道A10 24G显存响应速度会有多快,打算买个好些的服务器
我使用1张24GB的3090,在7b上面推理,大部分响应在1~3秒,个别会长一点。单人使用算是能接受的范围。
有没有在arm架构上跑的性价比比较高的方案
我使用1张24GB的3090,在7b上面推理,大部分响应在1~3秒,个别会长一点。单人使用算是能接受的范围。
13B的话正常推理会慢一半吧
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
你好,24G显存可以实现微调吗?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.