LiMa-cas issues

Results 7 issues of


                                            LiMa-cas

when I reference, is it much slower since I need if else to see which precision to dequantize?

What‘s the difference between llm-awq and this？

Hi，What‘s the difference between llm-awq and autoawq？thanks in advance！！！

how can I get the models of 0.45% sparsity by myself?

Does it support LLAMA3-8B-INSTRUCT or Qwen2-7b-instruct？

as mentioned above

QLORA VS PTQ

hello， how much time need and what datasets are u used？

question about the finetune

1. is the finetune need each layer？ could I used for some layers finetune once？ 2. is codebook quantized method is slower than AWQ? 3. when I inference，it is successful...

![image](https://github.com/user-attachments/assets/7a8e2b8a-3b61-49f8-8602-e579963850df) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 GiB. GPU 0 has a total capacity of 47.54 GiB of which 9.50 GiB is free. Process 1509125 has 9.68...