inference recommended to add the EXL2（ExLlamav2） model

recommended to add the EXL2（ExLlamav2） model

Open ZanePoe opened this issue 1 year ago • 1 comments

trafficstars

大量的实测，在双 2080ti 22G显卡下，exl2的速度是awq的两倍以上。并且，exl2在超长上下文的情况下，显存增长很平缓。双 2080ti 22G显卡下，6000左右的tokenawq量化的SUS-Chat-34B-AWQ会爆显存。而SUS-Chat-34B-6.0bpw-h6-exl2在7000token下依然显存容量充足。

Jan 03 '24 07:01 ZanePoe

Jan 13 '24 05:01 tensiondriven

Jun 17 '24 08:06 Greatz08

This issue is stale because it has been open for 7 days with no activity.

Aug 07 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Aug 12 '24 19:08 github-actions[bot]

inference inference copied to clipboard

recommended to add the EXL2（ExLlamav2） model

inference
inference copied to clipboard