inference
inference copied to clipboard
recommended to add the EXL2(ExLlamav2) model
trafficstars
大量的实测,在双 2080ti 22G显卡下,exl2的速度是awq的两倍以上。并且,exl2在超长上下文的情况下,显存增长很平缓。双 2080ti 22G显卡下,6000左右的tokenawq量化的SUS-Chat-34B-AWQ会爆显存。而SUS-Chat-34B-6.0bpw-h6-exl2在7000token下依然显存容量充足。
+1
+1
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.