peft Fix autocast_adapter_dtype=False for quantized models

What does this PR do?

Fixes the issue where autocast_adapter_dtype=False was being ignored when using quantized models with BitsAndBytes.

Fixes #2889

Problem

When a model is quantized using BitsAndBytes (e.g., 4-bit quantization), LoRA adapters were always initialized with float32 dtype, even when:

autocast_adapter_dtype=False was explicitly specified
The model's compute dtype was set to float16

This caused unexpected behavior and potential performance/memory issues.

Solution

Added a _get_weight_dtype() helper method to the LoraLayer class that:

Checks for compute_dtype attribute (present in BitsAndBytes quantized layers)
Falls back to weight.dtype for regular layers
Uses this dtype when creating lora_A and lora_B Linear Layers in update_layer()

Testing

✅ Verified the fix works correctly with custom test script:

Quantized model (4-bit with compute_dtype=float16) -> LoRA params are float16
Non-quantized model (dtype=float16) -> LoRA params are float16
Default behavior (autocast_adapter_dtype=True) still works as expected

✅ Ran 131 LoRA config tests locally - all passed

Nov 04 '25 06:11 Aznix07

Thanks for proposing this fix @Aznix07. However, to apply this broadly requires a lot more changes. I have worked on those in #2893. I think this PR can be closed. Still, your contribution is appreciated.

Nov 04 '25 14:11 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Dec 04 '25 15:12 github-actions[bot]