LiMa-cas
LiMa-cas
when I reference, is it much slower since I need if else to see which precision to dequantize?
Hi,What‘s the difference between llm-awq and autoawq?thanks in advance!!!
as mentioned above
hello, how much time need and what datasets are u used?
1. is the finetune need each layer? could I used for some layers finetune once? 2. is codebook quantized method is slower than AWQ? 3. when I inference,it is successful...
 torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 GiB. GPU 0 has a total capacity of 47.54 GiB of which 9.50 GiB is free. Process 1509125 has 9.68...