peft
peft copied to clipboard
Draft: Merge LoRA Adapters with AWQ BaseModels
This PR extends the AwqLoraLinear class to allow merging in of LoRA Adapters. Instead of re-quantizing the whole model, we use the original quantization scales and zeros.
@BenjaminBossan Thanks for looking into it already! Your three points are on my agenda, I will give you a ping when I commit the changes.
Great, thanks a lot.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
@Whadup Do you still plan on working on this?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
It's not quite clear to me, but it appears like AutoAWQ will be integrated into llm-compressor:
AutoAWQ Integration: Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. Note: This integration should be considered experimental for now.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.