Fix autocast_adapter_dtype=False for quantized models
What does this PR do?
Fixes the issue where autocast_adapter_dtype=False was being ignored when using quantized models with BitsAndBytes.
Fixes #2889
Problem
When a model is quantized using BitsAndBytes (e.g., 4-bit quantization), LoRA adapters were always initialized with float32 dtype, even when:
-
autocast_adapter_dtype=Falsewas explicitly specified - The model's compute dtype was set to
float16
This caused unexpected behavior and potential performance/memory issues.
Solution
Added a _get_weight_dtype() helper method to the LoraLayer class that:
- Checks for
compute_dtypeattribute (present in BitsAndBytes quantized layers) - Falls back to
weight.dtypefor regular layers - Uses this dtype when creating
lora_Aandlora_BLinear Layers inupdate_layer()
Testing
✅ Verified the fix works correctly with custom test script:
- Quantized model (4-bit with
compute_dtype=float16) -> LoRA params arefloat16 - Non-quantized model (
dtype=float16) -> LoRA params arefloat16 - Default behavior (
autocast_adapter_dtype=True) still works as expected
✅ Ran 131 LoRA config tests locally - all passed
Thanks for proposing this fix @Aznix07. However, to apply this broadly requires a lot more changes. I have worked on those in #2893. I think this PR can be closed. Still, your contribution is appreciated.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.