AQLM
AQLM copied to clipboard
question about the finetune
- is the finetune need each layer? could I used for some layers finetune once?
- is codebook quantized method is slower than AWQ?
- when I inference,it is successful when max_new_tokens=512, but failed when max_new_tokens=2048