peft icon indicating copy to clipboard operation
peft copied to clipboard

Fix autocast_adapter_dtype=False for quantized models

Open Aznix07 opened this issue 2 months ago • 1 comments

What does this PR do?

Fixes the issue where autocast_adapter_dtype=False was being ignored when using quantized models with BitsAndBytes.

Fixes #2889

Problem

When a model is quantized using BitsAndBytes (e.g., 4-bit quantization), LoRA adapters were always initialized with float32 dtype, even when:

  • autocast_adapter_dtype=False was explicitly specified
  • The model's compute dtype was set to float16

This caused unexpected behavior and potential performance/memory issues.

Solution

Added a _get_weight_dtype() helper method to the LoraLayer class that:

  1. Checks for compute_dtype attribute (present in BitsAndBytes quantized layers)
  2. Falls back to weight.dtype for regular layers
  3. Uses this dtype when creating lora_A and lora_B Linear Layers in update_layer()

Testing

✅ Verified the fix works correctly with custom test script:

  • Quantized model (4-bit with compute_dtype=float16) -> LoRA params are float16
  • Non-quantized model (dtype=float16) -> LoRA params are float16
  • Default behavior (autocast_adapter_dtype=True) still works as expected

✅ Ran 131 LoRA config tests locally - all passed

Aznix07 avatar Nov 04 '25 06:11 Aznix07

Thanks for proposing this fix @Aznix07. However, to apply this broadly requires a lot more changes. I have worked on those in #2893. I think this PR can be closed. Still, your contribution is appreciated.

BenjaminBossan avatar Nov 04 '25 14:11 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Dec 04 '25 15:12 github-actions[bot]