Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard
是否有不同大小的模型在,在某个硬件上每秒钟可以处理多少个token的性能数据?
是否有不同大小的模型在,在某个硬件上(GPU/CPU都可以)每秒钟可以处理多少个token的性能数据?
llama.cpp下,4-bit版本模型生成速度如下(运行5次),供参考。
- 7B: 71-72ms/token
- 13B: 134-140ms/token
注:处理器Apple M1 Max
如果是A100的话,估计能快几倍?有大概的估计吗?谢谢
是不是考虑建立一个覆盖多种任务场景的 benchmark ,让拥有不用硬件的玩家都来试试,这样才好收集到不同环境下的执行速度。
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.