peft Draft: Merge LoRA Adapters with AWQ BaseModels

Draft: Merge LoRA Adapters with AWQ BaseModels

Open Whadup opened this issue 8 months ago • 4 comments

This PR extends the AwqLoraLinear class to allow merging in of LoRA Adapters. Instead of re-quantizing the whole model, we use the original quantization scales and zeros.

Mar 10 '25 17:03 Whadup

@BenjaminBossan Thanks for looking into it already! Your three points are on my agenda, I will give you a ping when I commit the changes.

Mar 11 '25 10:03 Whadup

Great, thanks a lot.

Mar 11 '25 10:03 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Apr 10 '25 15:04 github-actions[bot]

@Whadup Do you still plan on working on this?

Apr 10 '25 15:04 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

May 05 '25 15:05 github-actions[bot]

It's not quite clear to me, but it appears like AutoAWQ will be integrated into llm-compressor:

AutoAWQ Integration: Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. Note: This integration should be considered experimental for now.

May 05 '25 15:05 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

May 30 '25 15:05 github-actions[bot]

peft peft copied to clipboard

Draft: Merge LoRA Adapters with AWQ BaseModels

peft
peft copied to clipboard