inference icon indicating copy to clipboard operation
inference copied to clipboard

recommended to add the EXL2(ExLlamav2) model

Open ZanePoe opened this issue 1 year ago • 1 comments
trafficstars

大量的实测,在双 2080ti 22G显卡下,exl2的速度是awq的两倍以上。并且,exl2在超长上下文的情况下,显存增长很平缓。双 2080ti 22G显卡下,6000左右的tokenawq量化的SUS-Chat-34B-AWQ会爆显存。而SUS-Chat-34B-6.0bpw-h6-exl2在7000token下依然显存容量充足。

ZanePoe avatar Jan 03 '24 07:01 ZanePoe

+1

tensiondriven avatar Jan 13 '24 05:01 tensiondriven

+1

Greatz08 avatar Jun 17 '24 08:06 Greatz08

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Aug 07 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

github-actions[bot] avatar Aug 12 '24 19:08 github-actions[bot]