Chinese-LLaMA-Alpaca 是否有不同大小的模型在，在某个硬件上每秒钟可以处理多少个token的性能数据？

是否有不同大小的模型在，在某个硬件上每秒钟可以处理多少个token的性能数据？

Open thewintersun opened this issue 1 year ago • 4 comments

是否有不同大小的模型在，在某个硬件上（GPU/CPU都可以）每秒钟可以处理多少个token的性能数据？

Apr 17 '23 08:04 thewintersun

llama.cpp下，4-bit版本模型生成速度如下（运行5次），供参考。

7B: 71-72ms/token
13B: 134-140ms/token

注：处理器Apple M1 Max

Apr 17 '23 08:04 ymcui

如果是A100的话，估计能快几倍？有大概的估计吗？谢谢

Apr 17 '23 09:04 thewintersun

是不是考虑建立一个覆盖多种任务场景的 benchmark ，让拥有不用硬件的玩家都来试试，这样才好收集到不同环境下的执行速度。

Apr 18 '23 03:04 dogvane

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

Apr 26 '23 00:04 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

May 13 '23 22:05 github-actions[bot]

Chinese-LLaMA-Alpaca Chinese-LLaMA-Alpaca copied to clipboard

是否有不同大小的模型在，在某个硬件上每秒钟可以处理多少个token的性能数据？

Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard