peft icon indicating copy to clipboard operation
peft copied to clipboard

Draft: Merge LoRA Adapters with AWQ BaseModels

Open Whadup opened this issue 8 months ago • 4 comments

This PR extends the AwqLoraLinear class to allow merging in of LoRA Adapters. Instead of re-quantizing the whole model, we use the original quantization scales and zeros.

Whadup avatar Mar 10 '25 17:03 Whadup

@BenjaminBossan Thanks for looking into it already! Your three points are on my agenda, I will give you a ping when I commit the changes.

Whadup avatar Mar 11 '25 10:03 Whadup

Great, thanks a lot.

BenjaminBossan avatar Mar 11 '25 10:03 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Apr 10 '25 15:04 github-actions[bot]

@Whadup Do you still plan on working on this?

BenjaminBossan avatar Apr 10 '25 15:04 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar May 05 '25 15:05 github-actions[bot]

It's not quite clear to me, but it appears like AutoAWQ will be integrated into llm-compressor:

AutoAWQ Integration: Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. Note: This integration should be considered experimental for now.

BenjaminBossan avatar May 05 '25 15:05 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar May 30 '25 15:05 github-actions[bot]